Question

I'm trying to write a data frame to a gzip file but having problems.

Here's my code example:

df1 <- data.frame(id = seq(1,10,1), var1 = runif(10), var2 = runif(10))

gz1 <- gzfile("df1.gz","w" )
writeLines(df1)

Error in writeLines(df1) : invalid 'text' argument

Any suggestions?

EDIT: an example line of the character vector I'm trying to write is:

0 | var1:1.5 var2:.55 var7:1250

The class label / y-variable is separated from the x-vars by a " | " and variable names are separated from values by " : " and spaces between variables.

EDIT2: I apologize for the wording / format of the question but here are the results: Old method:

system.time(write(out1, file="out1.txt"))
#    user  system elapsed 
#   9.772  17.205  86.860 

New Method:

writeGzFile <- function(){
  gz1 = gzfile("df1.gz","w");
  write(out1, gz1);
  close(gz1) 
}

system.time( writeGzFile())
#    user  system elapsed 
#   2.312   0.000   2.478 

Thank you all very much for helping me figure this out.

Was it helpful?

Solution

writeLines expects a list of strings. The simplest way to write this to a gzip file would be

df1 <- data.frame(id = seq(1,10,1), var1 = runif(10), var2 = runif(10))
gz1 <- gzfile("df1.gz", "w")
write.csv(df1, gz1)
close(gz1)

This will write it as a gzipped csv. Also see write.table and write.csv2 for alternate ways of writing the file out.

EDIT:Based on the updates to the post about desired format, I made the following helper (quickly thrown together, probably admits tons of simplification):

function(df) {
    rowCount <- nrow(df)
    dfNames <- names(df)
    dfNamesIndex <- length(dfNames)
    sapply(1:rowCount, function(rowIndex) {
        paste(rowIndex, '|', 
            paste(sapply(1:dfNamesIndex, function(element) {
                c(dfNames[element], ':', df[rowIndex, element])
            }), collapse=' ')
        )
    })
}

So the output looks like

a <- data.frame(x=1:10,y=rnorm(10))
writeLines(myser(a))
# 1 | x : 1 y : -0.231340933021948
# 2 | x : 2 y : 0.896777389870928
# 3 | x : 3 y : -0.434875004781075
# 4 | x : 4 y : -0.0269824962632977
# 5 | x : 5 y : 0.67654540494899
# 6 | x : 6 y : -1.96965253674725
# 7 | x : 7 y : 0.0863177759402661
# 8 | x : 8 y : -0.130116466571162
# 9 | x : 9 y : 0.418337557610229
# 10 | x : 10 y : -1.22890714891874

And all that is necessary is to pass the gzfile in to writeLines to get the desired output.

OTHER TIPS

To write something to a gzip file you need to "serialize" it to text. For R objects you can have a stab at that by using dput:

gz1 = gzfile("df1.gz","w")
dput(df1, gz1)
close(gz1)

However you've just written a text representation of the data frame to the file. This will quite probably be less efficient than using save(df1,file="df1.RData") to save it to a native R data file. Ask yourself: why am I saving it as a .gz file?

In a quick test with some random numbers, the gz file was 54k, the .RData file was 34k

Another very simple way to do it is:

# We create the .csv file
write.csv(df1, "df1.csv")

# We compress it deleting the .csv
system("gzip df1.csv")

Got the idea from: http://blog.revolutionanalytics.com/2009/12/r-tip-save-time-and-space-by-compressing-data-files.html

You can use the gzip function in R.utils:

library(R.utils)
library(data.table)

#Write gzip file
df <- data.table(var1='Compress me',var2=', please!')
fwrite(df,'filename.csv',sep=',')
gzip('filename.csv',destname='filename.csv.gz')`

#Read gzip file
fread('gzip -dc filename.csv.gz')
          var1      var2
1: Compress me , please!

For tidyverse methods adding the compression extension to the file name will perform the compression. From https://readr.tidyverse.org/reference/write_delim.html

The write_*() functions will automatically compress outputs if an appropriate extension is given. At present, three extensions are supported, .gz for gzip compression, .bz2 for bzip2 compression and .xz for lzma compression.

library(tidyverse)
df <- data.table(var1='Compress me',var2=', please!')
write_csv(df, "filename.csv.gz")
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top