Question

I'm importing table from a fixed width format .txt file in R. This table has about 100 observations and 200000 lines (a few lines below).

11111 2008  7 31 21 2008  8  1 21 3  4  6 18 4    7 0 12 0  0 0 0 0 1 0 0 0 0 0 0 0      5 0 0 7   5 0 1 0  2 0   0 0  0 0 0  2 0 0    0.0 5  14.9 0  14.9 0  14.0 0  16.5 0  14.9 0  15.6 0  15.3 0 0  15.6 0  15.6 0  17.6 0  16.1 0 17.10 0 1  97 0  0.60 0 1  15.1 0  986.6 0 1002.9 0  7 0  0.2 0
11111 2008  8  1  0 2008  8  1  0 4  7  6 18 4 98 0 1  9 0  0 0 2 0 1 0 0 0 0 0 0 0      5 0 0 7 0 0 0 1 0  2 0 260 0  1 0 0  2 0 0    0.0 5  14.4 0  14.4 0  13.0 0  14.9 0  14.9 0  15.2 0  14.6 0 0  15.2 0  14.8 0  16.1 0  15.7 0 16.10 0 1  93 0  1.20 0 1  14.1 0  986.1 0 1002.4 0  7 0  0.5 0
11111 2008  8  1  3 2008  8  1  3 5 10  6 18 4 98 0 1  3 0  0 0 1 0 0 0 0 0 0 0 0 0      5 0 0 7   5 0 1 0  2 0 200 0  1 0 0  4 0 0    0.0 5  25.8 0       7  14.4 0  26.0 0  26.0 0  19.8 0  17.0 0 0  19.8 0  15.2 0  20.1 0  20.1 0 17.10 0 1  74 0  6.00 0 1  15.1 0  984.5 0 1000.6 0  8 0  1.6 0
11111 2008  8  1  6 2008  8  1  6 6 13  6 18 4 98 0 1  7 0  6 0 1 0 0 0 1 0 0 0 0 0 1000 0 1 0 7   5 0 1 0  2 0 230 0  2 0 0  8 0 0    0.0 5  36.0 0       5       5  40.0 0  36.0 0  23.7 0  17.4 0 0  23.7 0  19.8 0  24.6 0  24.0 0 14.80 0 1  51 0 14.50 0 1  12.8 0  983.9 0  999.7 0  6 0  0.6 0
11111 2008  8  1  9 2008  8  1  9 7 16  6 18 4 96 0 0  9 0  9 0 0 0 0 0 2 0 0 0 0 0 1200 0 0 0 7   5 0   7 95 0 300 0  3 0 0 13 0 0    0.0 5  23.5 0       5       5  43.8 0  23.6 0  19.6 0  17.3 0 0  19.6 0  19.6 0  26.0 0  19.8 0 17.90 0 1  79 0  4.90 0 1  15.8 0  981.9 0  997.9 0  8 0  2.0 0

Right now, I'm using the following code leading to a pretty long loading (about 1 minute):

col_width <- c(5,5,3,3,3,5,3,3,3,2,
           3,3,3,2,3,2,2,3,2,3,
           2,2,2,2,2,2,2,2,2,2,
           2,5,2,2,2,2,2,2,2,2,
           2,3,2,4,2,3,2,2,3,2,
           2,7,2,6,2,6,2,6,2,6,
           2,6,2,6,2,6,2,2,6,2,
           6,2,6,2,6,2,6,2,2,4,
           2,6,2,2,6,2,7,2,7,2,
           3,2,5,2)

df.h.tomsk <- read.fwf(path, 
                       widths=col_width, 
                       header=FALSE, 
                       sep="\t", 
                       nrows=200000, 
                       comment.char="",
                       buffersize=5000)

Any suggestion(s) to accelerate the process? For example is there something like fread from data.table working with fwf format?

No correct solution

OTHER TIPS

Have you tried using fread of library(data.table)? Please copy paste some lines of your file to check it...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top