Size of Rdata file compared to csv

https://stackoverflow.com/questions/16919867

31-05-2022
|

문제

The size of my .Rdata file is 92 MB.

However, the original csv-file is around 3 GB. I included it with the usual read.csv()

How can that be?

해결책

The comments already hinted at what is going on. But this is so straightforward, let us do an example:

R> X <- 1:1e5   # data, no repeats
R> save(X, file="/tmp/foo.RData")
R> write.csv(X, file="/tmp/foo.csv")
R> system("ls -l /tmp/foo*")
-rw-r--r-- 1 x y 1377797 Jun  4 09:11 /tmp/foo.csv
-rw-r--r-- 1 x y  212397 Jun  4 09:11 /tmp/foo.RData

Now with data that repeats:

R> X <- rep(1,1e5)   # data, lots of repeats
R> write.csv(X, file="/tmp/bar.csv")
R> save(X, file="/tmp/bar.RData")
R> system("ls -lh /tmp/bar*")
-rw-r--r-- 1 x y 966K Jun  4 09:12 /tmp/bar.csv
-rw-r--r-- 1 x y 1.3K Jun  4 09:12 /tmp/bar.RData
R>

So we are getting ratios of 6.5 to 743 depending on how well this compresses. And that is before we make the csv more "expensive" by forcing several decimals to be printed...

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow