import txt file with desired data structure in R

https://stackoverflow.com/questions/23359263

r
read.table

11-07-2023
|

Question

The txt is like

#---*----1----*----2----*---
Name     Time.Period   Value
A           Jan 2013      10
B           Jan 2013      11
C           Jan 2013      12
A           Feb 2013       9
B           Feb 2013      11
C           Feb 2013      15
A           Mar 2013      10
B           Mar 2013       8
C           Mar 2013      13

I tried to use read.table with readLines and count.field as shown belows:

> path <- list.files()
> data <- read.table(text=readLines(path)[count.fields(path, blank.lines.skip=FALSE) == 4])
Warning message:
In readLines(path) : incomplete final line found on 'data1.txt'
> data
  V1  V2   V3 V4
1  A Jan 2013 10
2  B Jan 2013 11
3  C Jan 2013 12
4  A Feb 2013  9
5  B Feb 2013 11
6  C Feb 2013 15
7  A Mar 2013 10
8  B Mar 2013  8
9  C Mar 2013 13

The problem is that it give four attributes instead of three. Therefore i manipulate my data as below which seeking a alternative.

> library(zoo)
> data$Name <- as.character(data$V1)
> data$Time.Period <- as.yearmon(paste(data$V2, data$V3, sep=" "))
> data$Value <- as.numeric(data$V4)
> DATA <- data[, 5:7]
> DATA
  Name Time.Period Value
1    A    Jan 2013    10
2    B    Jan 2013    11
3    C    Jan 2013    12
4    A    Feb 2013     9
5    B    Feb 2013    11
6    C    Feb 2013    15
7    A    Mar 2013    10
8    B    Mar 2013     8
9    C    Mar 2013    13

Solution

You can use read.fwf to read fixed width files. You need to correctly specify the width of each column, in spaces.

data <- read.fwf(path, widths=c(-12, 8, -4, 2), header=T)

The key there is how you specify the width. Negative means skip that many places, positive means read that many. I am assuming entries in the last column have only 2 digits. Change widths accordingly if this is not the case. You will probably also have to fix the column names.

You will have to change the indices if the file format changes, or come up with some clever regexp to read it from the first few rows. A better solution would be to enclose your strings in " or, even better, avoid the format altogether.

OTHER TIPS

?count.fields

As the R Documentation states count.fields counts the number of fields, as separated by sep, in each of the lines of file read, when you set count.fields(path, blank.lines.skip=FALSE) == 4 it will skip the header row which actually has three fields.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow