Question

I am working on a data set that has multiple traffic speed measurements per day. My data is from the city of chicago, and it is taken every minute for about six months. I wanted to consolidate this data into days only, so this is what I did:

traffic <- read.csv("path.csv",header=TRUE)
traffic2 <- aggregate(SPEED~DATE, data=traffic, FUN=MEAN)

this was perfect because it took all of my data and averaged it by date. For example, my original data looked something like this:

DATE        SPEED  
12/31/2012   22
12/31/2012   25
12/31/2012   23
...

and the final looked like this:

DATE        SPEED 
10/1/2012    22
10/2/2012    23
10/3/2012    22
...

The only problem, is my data is supposed to start at 9/1/2012. I plotted this data, and it turns out the data goes from 10/1/2012-12/31/2012 and then 9/1/2012-9/30/2012.

What in the world is going on here?

Was it helpful?

Solution

I am going to agree with @user1683454's comment. After importing, your DATE column is of either character, or factor class (depending on your settings for stringsAsFactors). Therefore, I think that you can solve this issue in at least several ways, as follows:

1) Convert data to correct type during import. To do this, just use the following options of read.csv(): stringsAsFactors (or as.is) and colClasses. By default, you can specify conversion to Date or POSIXct classes. If you need a non-standard format, you have two options. First, if you have a single Date column, you can use as.Date.character() to pass the desired format to colClasses. Second, if you have multiple Date columns, you can write a function for that and pass it to colClasses via setAs(). Both options are discussed here: https://stackoverflow.com/questions/13022299/specify-date-format-for-colclasses-argument-in-read-table-read-csv.

2) Convert data to correct format after import. Thus, after calling read.csv(), you would have to execute the following code: dateColumn <- as.Date(dateColumn, "%m/%d/%Y") or dateColumn <- strptime(dateColumn, "%m/%d/%Y") (adjust the format to whatever Date format you need).

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top