Question

I need to write a function that takes 2 arguments: directory and id. directory is essentially the working directly of R. id is the name of the file. All files have extension csv and a name between 001 to 332.

The function should return a data frame with two columns: id and nrow. id is the name of the file, and nrow is the number of rows in that file.

I started with the following codes but this will only work with if the there is only 1 id get passed to the function:

directory = 'specdata'
id = 1 # this should be able to take a list of numbers i.e. 1:332
id1 = if(nchar(id) == 1) {paste("00",id,sep="")}
      else if (nchar()== 2) {paste("0",id,sep="")})
file = paste(directory,"/",as.character(id1),".csv", sep="")
data = read.csv(file)
casenum = nrow(data)
output = c(id1, casenum)

How can I modify the codes so that the function can repeat itself if more than 1 id is passed. For example the line id = c(1,2,3,5,6) is getting passed? I am thinking of using lapply or sapply but don’t know where to start. Thanks,

Was it helpful?

Solution

  • If you just want to have the number of lines in each file, then I'd actually recommend using the command line tool wc: it will be much faster.
    wc -l *.csv will give you an ASCII table with the number of lines in the first column and the file name thereafter.
    (wc for the Windows command line is available e.g. as part of the GNU core utilities.)

  • If it is about doing something for each .csv file in the directory, use
    file = Sys.glob (paste0 (directory, "/*.csv")

Anyways, here's what you asked for more specifically:

directory = 'specdata'
id = 1:17

file = sprintf ("%s/%03i.csv", directory, id) # now a vector with file names

casenum = sapply (file, function (f) nrow (read.csv (f)))

cbind (id, casenum) 

# or, if you prefer a data.frame

data.frame (id = id, casenum = casenum)

OTHER TIPS

While not necessary, it's easier to read if you wrap up the operations in a named function. That can be within another function:

countall <- function(directory, ids) {
  countlines <- function(id) {
    ## Your code, copied from the question
    id1 = if(nchar(id) == 1) {paste("00",id,sep="")}
      else if (nchar(id)== 2) {paste("0",id,sep="")}
    file = paste(directory,"/",as.character(id1),".csv", sep="")
    data = read.csv(file)
    casenum = nrow(data)
    ## No need to attach the id here, as you can use the names
    return(casenum)
  }

  retval <- lapply(ids, countlines)  # or sapply, to return a vector instead of a list
  names(retval) <- ids

  return(retval)
}

Run with:

countall('specdata', 1:10)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top