Question

I'm trying to load a huge (~5GB) .csv file into R using read.csv.ffdf. The command goes:

npi <- read.csv.ffdf(file="C:/Users/DSA/Dropbox/Team Shared Files/People/Ross/NPI_Parse/Zips/npi_full.csv", VERBOSE=TRUE, first.rows=10000,next.rows=100000,colClasses=NA)

The command runs for a while and then throws the following error: "no applicable method for 'recodeLevels' applied to an object of class "c('double', 'numeric')." Some searching tells me I need to use the transFUN option but I have no idea how to apply it. The data is both text and numbers and I think that may be causing issues. I can upload a screenshot of the csv if it helps but it takes ages to open in LibreOffice.

Anyone know any tricks?

Was it helpful?

Solution

From the documentation of read.csv.ffdf.

transFUN: NULL or a function that is called on each data.frame chunk after reading with FUN and before further processing (for filtering, transformations etc.)

If one of your columns changes from being a factor to a numeric or vice versa, make sure it is a factor using transFUN

npi <- read.csv.ffdf(
  file="C:/Users/DSA/Dropbox/Team Shared Files/People/Ross/NPI_Parse/Zips/npi_full.csv",
  VERBOSE=TRUE, first.rows=10000,next.rows=100000, 
  transFUN=function(x){
    x$yourcolumnwiththeerror <- factor(as.character(x$yourcolumnwiththeerror))
    x
  })
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top