Domanda

I'm importing table from a fixed width format .txt file in R. This table has about 100 observations and 200000 lines (a few lines below).

11111 2008  7 31 21 2008  8  1 21 3  4  6 18 4    7 0 12 0  0 0 0 0 1 0 0 0 0 0 0 0      5 0 0 7   5 0 1 0  2 0   0 0  0 0 0  2 0 0    0.0 5  14.9 0  14.9 0  14.0 0  16.5 0  14.9 0  15.6 0  15.3 0 0  15.6 0  15.6 0  17.6 0  16.1 0 17.10 0 1  97 0  0.60 0 1  15.1 0  986.6 0 1002.9 0  7 0  0.2 0
11111 2008  8  1  0 2008  8  1  0 4  7  6 18 4 98 0 1  9 0  0 0 2 0 1 0 0 0 0 0 0 0      5 0 0 7 0 0 0 1 0  2 0 260 0  1 0 0  2 0 0    0.0 5  14.4 0  14.4 0  13.0 0  14.9 0  14.9 0  15.2 0  14.6 0 0  15.2 0  14.8 0  16.1 0  15.7 0 16.10 0 1  93 0  1.20 0 1  14.1 0  986.1 0 1002.4 0  7 0  0.5 0
11111 2008  8  1  3 2008  8  1  3 5 10  6 18 4 98 0 1  3 0  0 0 1 0 0 0 0 0 0 0 0 0      5 0 0 7   5 0 1 0  2 0 200 0  1 0 0  4 0 0    0.0 5  25.8 0       7  14.4 0  26.0 0  26.0 0  19.8 0  17.0 0 0  19.8 0  15.2 0  20.1 0  20.1 0 17.10 0 1  74 0  6.00 0 1  15.1 0  984.5 0 1000.6 0  8 0  1.6 0
11111 2008  8  1  6 2008  8  1  6 6 13  6 18 4 98 0 1  7 0  6 0 1 0 0 0 1 0 0 0 0 0 1000 0 1 0 7   5 0 1 0  2 0 230 0  2 0 0  8 0 0    0.0 5  36.0 0       5       5  40.0 0  36.0 0  23.7 0  17.4 0 0  23.7 0  19.8 0  24.6 0  24.0 0 14.80 0 1  51 0 14.50 0 1  12.8 0  983.9 0  999.7 0  6 0  0.6 0
11111 2008  8  1  9 2008  8  1  9 7 16  6 18 4 96 0 0  9 0  9 0 0 0 0 0 2 0 0 0 0 0 1200 0 0 0 7   5 0   7 95 0 300 0  3 0 0 13 0 0    0.0 5  23.5 0       5       5  43.8 0  23.6 0  19.6 0  17.3 0 0  19.6 0  19.6 0  26.0 0  19.8 0 17.90 0 1  79 0  4.90 0 1  15.8 0  981.9 0  997.9 0  8 0  2.0 0

Right now, I'm using the following code leading to a pretty long loading (about 1 minute):

col_width <- c(5,5,3,3,3,5,3,3,3,2,
           3,3,3,2,3,2,2,3,2,3,
           2,2,2,2,2,2,2,2,2,2,
           2,5,2,2,2,2,2,2,2,2,
           2,3,2,4,2,3,2,2,3,2,
           2,7,2,6,2,6,2,6,2,6,
           2,6,2,6,2,6,2,2,6,2,
           6,2,6,2,6,2,6,2,2,4,
           2,6,2,2,6,2,7,2,7,2,
           3,2,5,2)

df.h.tomsk <- read.fwf(path, 
                       widths=col_width, 
                       header=FALSE, 
                       sep="\t", 
                       nrows=200000, 
                       comment.char="",
                       buffersize=5000)

Any suggestion(s) to accelerate the process? For example is there something like fread from data.table working with fwf format?

Nessuna soluzione corretta

Altri suggerimenti

Have you tried using fread of library(data.table)? Please copy paste some lines of your file to check it...

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top