Question

I'm importing table from a fixed width format .txt file in R. This table has about 100 observations and 200000 lines (a few lines below).

11111 2008  7 31 21 2008  8  1 21 3  4  6 18 4    7 0 12 0  0 0 0 0 1 0 0 0 0 0 0 0      5 0 0 7   5 0 1 0  2 0   0 0  0 0 0  2 0 0    0.0 5  14.9 0  14.9 0  14.0 0  16.5 0  14.9 0  15.6 0  15.3 0 0  15.6 0  15.6 0  17.6 0  16.1 0 17.10 0 1  97 0  0.60 0 1  15.1 0  986.6 0 1002.9 0  7 0  0.2 0
11111 2008  8  1  0 2008  8  1  0 4  7  6 18 4 98 0 1  9 0  0 0 2 0 1 0 0 0 0 0 0 0      5 0 0 7 0 0 0 1 0  2 0 260 0  1 0 0  2 0 0    0.0 5  14.4 0  14.4 0  13.0 0  14.9 0  14.9 0  15.2 0  14.6 0 0  15.2 0  14.8 0  16.1 0  15.7 0 16.10 0 1  93 0  1.20 0 1  14.1 0  986.1 0 1002.4 0  7 0  0.5 0
11111 2008  8  1  3 2008  8  1  3 5 10  6 18 4 98 0 1  3 0  0 0 1 0 0 0 0 0 0 0 0 0      5 0 0 7   5 0 1 0  2 0 200 0  1 0 0  4 0 0    0.0 5  25.8 0       7  14.4 0  26.0 0  26.0 0  19.8 0  17.0 0 0  19.8 0  15.2 0  20.1 0  20.1 0 17.10 0 1  74 0  6.00 0 1  15.1 0  984.5 0 1000.6 0  8 0  1.6 0
11111 2008  8  1  6 2008  8  1  6 6 13  6 18 4 98 0 1  7 0  6 0 1 0 0 0 1 0 0 0 0 0 1000 0 1 0 7   5 0 1 0  2 0 230 0  2 0 0  8 0 0    0.0 5  36.0 0       5       5  40.0 0  36.0 0  23.7 0  17.4 0 0  23.7 0  19.8 0  24.6 0  24.0 0 14.80 0 1  51 0 14.50 0 1  12.8 0  983.9 0  999.7 0  6 0  0.6 0
11111 2008  8  1  9 2008  8  1  9 7 16  6 18 4 96 0 0  9 0  9 0 0 0 0 0 2 0 0 0 0 0 1200 0 0 0 7   5 0   7 95 0 300 0  3 0 0 13 0 0    0.0 5  23.5 0       5       5  43.8 0  23.6 0  19.6 0  17.3 0 0  19.6 0  19.6 0  26.0 0  19.8 0 17.90 0 1  79 0  4.90 0 1  15.8 0  981.9 0  997.9 0  8 0  2.0 0

Right now, I'm using the following code leading to a pretty long loading (about 1 minute):

col_width <- c(5,5,3,3,3,5,3,3,3,2,
           3,3,3,2,3,2,2,3,2,3,
           2,2,2,2,2,2,2,2,2,2,
           2,5,2,2,2,2,2,2,2,2,
           2,3,2,4,2,3,2,2,3,2,
           2,7,2,6,2,6,2,6,2,6,
           2,6,2,6,2,6,2,2,6,2,
           6,2,6,2,6,2,6,2,2,4,
           2,6,2,2,6,2,7,2,7,2,
           3,2,5,2)

df.h.tomsk <- read.fwf(path, 
                       widths=col_width, 
                       header=FALSE, 
                       sep="\t", 
                       nrows=200000, 
                       comment.char="",
                       buffersize=5000)

Any suggestion(s) to accelerate the process? For example is there something like fread from data.table working with fwf format?

Pas de solution correcte

Autres conseils

Have you tried using fread of library(data.table)? Please copy paste some lines of your file to check it...

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top