Question

I am working on "Localization Data for Person Activity Data Set" dataset from UCI and in this data set there is a column of date and time(both in one column) with following format:

27.05.2009 14:03:25:777
27.05.2009 14:03:25:183
27.05.2009 14:03:25:210
27.05.2009 14:03:25:237
...

I am wondering if there is anyway to convert this column to timestamp using R.

Was it helpful?

Solution

First of all, we need to substitute the colon separating the milliseconds from the seconds to a dot, otherwise the final step won't work (thanks to Dirk Eddelbuettel for this one). Since in the end R will use the separators it wants, to be quicker, I'll just go ahead and substitute all the colons for dots:

x <- "27.05.2009 14:03:25:777"  # this is a simplified version of your data
y <- gsub(":", ".", x)          # this is your vector with the aforementioned substitution

By the way, this is how your vector should look after gsub:

> y
[1] "27.05.2009 14.03.25.777"

Now, in order to have it show the milliseconds, you first need to adjust an R option and then use a function called strptime, which will convert your date vector to POSIXlt (an R-friendly) format. Just do the following:

> options(digits.secs = 3)           # this tells R you want it to consider 3 digits for seconds.
> strptime(y, "%d.%m.%Y %H:%M:%OS")  # this finally formats your vector
[1] "2009-05-27 14:03:25.777"

I've learned this nice trick here. This other answer also says you can skip the options setting and use, for example, strptime(y, "%d.%m.%Y %H:%M:%OS3"), but it doesn't work for me. Henrik noted that the function's help page, ?strptime states that the %OS3 bit is OS-dependent. I'm using an updated Ubuntu 13.04 and using %OS3 yields NA.

When using strptime (or other POSIX-related functions such as as.Date), keep in mind some of the most common conversions used (edited for brevity, as suggested by DWin. Complete list at strptime):

  • %a Abbreviated weekday name in the current locale.
  • %A Full weekday name in the current locale.
  • %b Abbreviated month name in the current locale.
  • %B Full month name in the current locale.
  • %d Day of the month as decimal number (01–31).
  • %H Hours as decimal number (00–23). Times such as 24:00:00 are accepted for input.
  • %I Hours as decimal number (01–12).
  • %j Day of year as decimal number (001–366).
  • %m Month as decimal number (01–12).
  • %M Minute as decimal number (00–59).
  • %p AM/PM indicator in the locale. Used in conjunction with %I and not with %H.
  • `%S Second as decimal number (00–61), allowing for up to two leap-seconds (but POSIX-compliant implementations will ignore leap seconds).
  • %U Week of the year as decimal number (00–53) using Sunday as the first day 1 of the week (and typically with the first Sunday of the year as day 1 of week 1). The US convention.
  • %w Weekday as decimal number (0–6, Sunday is 0).
  • %W Week of the year as decimal number (00–53) using Monday as the first day of week (and typically with the first Monday of the year as day 1 of week 1). The UK convention.
  • %y Year without century (00–99). On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19
  • %Y Year with century. Note that whereas there was no zero in the original Gregorian calendar, ISO 8601:2004 defines it to be valid (interpreted as 1BC)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top