Question

I'm trying to get file creation dates from R and I understand that this information might not be possible to retrieve at all on some operating systems that just don't store it anywhere. However, I'm unsure how to retrieve it generically when it is (at least, theoretically) retrievable.

On Windows, this is straight forward because ctime from file.info provides this information, for reference, this is the relevant excerpt from ?file.info

What is meant by the three file times depends on the OS and file system. On Windows native file systems ctime is the file creation time (something which is not recorded on most Unix-alike file systems).

However, although most unix systems don't record this information (as pointed out in the help), some unix-based systems such as OS X do in fact store this. On OS X, for example, the system command metadata ls mdls will print file metadata and list kMDItemContentCreationDate (the actual creation date of the file) as one of the file attributes.

My question is, what advice do people have for getting at file creation dates (if they are available at all) from file metadata? (e.g. specifically in the case of OS X where there's a system command but no direct R call)

UPDATE:

Thanks to info from the comments + details on SO and SE here and here, I've come up with a way to solve this in R on OS X type unix platforms that track creation date and have the BSD style stat command. However, I still couldn't figure out how to do this in R on other linux systems that track creation date but don't have this version of stat. In this answer on unix SE, it is suggested that this info could be retrieved with debugfs + stat even when stat itself does not report it (provided the file system records birthdate), but that solution I couldn't get to work (only linux I could test on didn't have debugfs). Anyways, here's how far I got:

get_birthdate <- function(filepath) {
  switch(Sys.info()[['sysname']],
         Windows = {
           # Windows
           file.info(filepath)$ctime
         },
         Darwin = {
           # OS X
           cmd <- paste('stat -f "%DB"', filepath) # use BSD stat command
           ctime_sec <- as.integer(system(cmd, intern=T)) # retrieve birth date in seconds from start of epoch (%DB)
           as.POSIXct(ctime_sec, origin = "1970-01-01", tz = "") # convert to POSIXct
         },
         Linux  = {
           # Linux
           stop("not sure how to do this")
         })
}
Was it helpful?

Solution

Following other's pointers, this should work quite reasonably. Unfortunately it needs root privileges (dued to debugfs) and it's not very efficient yet (especially a bit quick'n dirty on regular expressions, but it's 01:00 o clock in the morning here :) ).

BTW, we set up the pager to be cat (making debugfs to print on standard output), find in which device the file is stored in order to use debugfs properly and finally get the stats and elaborate it a bit.

In general, in UNIX, once you have a bash-command to read its output in R you have to use pipe in read mode(that is default) and readLines.

Test done in a Debian Gnu Linux.

np350v5c:/home/l# R
> my.file <- "/etc/network/interfaces"
> 
> setup_pager <- function() {system("export PAGER=cat")}
> 
> where_is <- function(file) {
      con <- pipe(sprintf("df %s", file))
      res <- strsplit(readLines(con)[2], " ")[[1]][1]
      close(con)
      res
  }
> 
> where_is(my.file) # could be /dev/sda1 as well, depending on /etc/fstab
[1] "/dev/disk/by-uuid/9ce40c2b-60d8-40b1-890f-1e5da4199c88"
> 
> my.command <- sprintf("debugfs -R 'stat %s' %s",
                        my.file,
                        where_is(my.file))
> 
> ## root privileges especially here ..                                           
> setup_pager()
> con <- pipe(my.command)
> debugfs <- readLines(con)
debugfs 1.42.9 (4-Feb-2014)
> close(con)
> 
> my.date <- gsub("^crtime:.+-- ", "", grep("^crtime", debugfs, value = TRUE))
> my.date
[1] "Tue Feb 19 00:07:21 2013"
> strptime(tolower(substr(my.date, 5, nchar(my.date))),
           format = "%b %d %H:%M:%S %Y")
[1] "2013-02-19 00:07:21 CET"

HTH, Luca

OTHER TIPS

I know I am a little late to the game here, but here is a pretty easy solution for unix/Mac OS:

file.name <- "~/dir/file.extension"
df$file_created_dt <- system(paste0("stat -f %SB ", file.name), intern = T)

And then you can format it however you like:

df$file_created_dt <- as.POSIXct(df$file_created_dt, format = "%b %d %H:%M:%S %Y", origin = "1970-01-01 00:00:00", tz = "your/timezone")
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top