Reading a file in R

https://stackoverflow.com/questions/9778482

r
read.table

25-05-2021
|

题

I would like to read a file in R as a table for a file that contains information in an odd format.

The file data.txt has the data written as:

04001400 HI 34.50 118.27 19480701 08 LST
         0   0   0   0   0   0   0   0   0   0   0   0
       MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS
04001400 HI 34.50 118.27 19480801 08 LST
         0   0   0   0   0   0   0   0   0   0   0   0
       MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS
04001400 HI 34.50 118.27 19480901 08 LST
         0   0   0   0   0   0   0   0   0   0   0   0
       MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS

the first number is a station number, HI is a case, the third and fourth numbers are lat and long coordinates, the other number is year,month,day,, and the other number (08) is time zone, followed by LST meaning time frame. The following 24 numbers or in the above example the 0's and MIS are values for a particular region and time. I am trying to store the contents of the file as a table in this type of format of dimension [n x 31] (where 31 is the number of columns and n is the amount of rows total in the file):

04001400 HI 34.50 118.27 19480701 08 LST 0   0   0   0   0   0   0   0   0   0   0   0 MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS

04001400 HI 34.50 118.27 19480801 08 LST 0   0   0   0   0   0   0   0   0   0   0   0 MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS

04001400 HI 34.50 118.27 19480901 08 LST 0   0   0   0   0   0   0   0   0   0   0   0 MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS

I have tried coding it this way based on the function read.table:

data <- read.table("data.txt", sep = c("\b", "\t", "\n"))

But it does not work as I have described above. Is there a way that I can do that? Thank you for your help.

解决方案

You can use scan to read multi-line data, especially since it is a specific format.

dat <- data.frame(scan("data.txt",
what = as.list(c("character","character","number","number",
                 "character","number","character",
                  rep("character",24))),
multi.line=TRUE))
names(dat) <- paste("V",1:ncol(dat),sep="")

which gives

> dat
        V1 V2    V3     V4       V5 V6  V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
1 04001400 HI 34.50 118.27 19480701 08 LST  0  0   0   0   0   0   0   0   0
2 04001400 HI 34.50 118.27 19480801 08 LST  0  0   0   0   0   0   0   0   0
3 04001400 HI 34.50 118.27 19480901 08 LST  0  0   0   0   0   0   0   0   0
  V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31
1   0   0   0 MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS
2   0   0   0 MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS
3   0   0   0 MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS
> dim(dat)
[1]  3 31

You could, of course, give more informative names to the columns.

EDIT:

As Josh points out in the comments, my what argument was of the wrong format and caused all columns to be imported as character rather than some as character and some as numeric. It should have read:

dat <- data.frame(scan("data.txt",
what = list(character(), character(), numeric(), numeric(),
            character(), numeric(), character(),
            character(), character(), character(), character(),
            character(), character(), character(), character(),
            character(), character(), character(), character(),
            character(), character(), character(), character(),
            character(), character(), character(), character(),
            character(), character(), character(), character()),
multi.line=TRUE))
names(dat) <- paste("V",1:ncol(dat),sep="")

which gives the more appropriate:

> str(dat)

'data.frame':   3 obs. of  31 variables:
 $ V1 : Factor w/ 1 level "04001400": 1 1 1
 $ V2 : Factor w/ 1 level "HI": 1 1 1
 $ V3 : num  34.5 34.5 34.5
 $ V4 : num  118 118 118
 $ V5 : Factor w/ 3 levels "19480701","19480801",..: 1 2 3
 $ V6 : num  8 8 8
 $ V7 : Factor w/ 1 level "LST": 1 1 1
 $ V8 : Factor w/ 1 level "0": 1 1 1
 $ V9 : Factor w/ 1 level "0": 1 1 1
 $ V10: Factor w/ 1 level "0": 1 1 1
 $ V11: Factor w/ 1 level "0": 1 1 1
 $ V12: Factor w/ 1 level "0": 1 1 1
 $ V13: Factor w/ 1 level "0": 1 1 1
 $ V14: Factor w/ 1 level "0": 1 1 1
 $ V15: Factor w/ 1 level "0": 1 1 1
 $ V16: Factor w/ 1 level "0": 1 1 1
 $ V17: Factor w/ 1 level "0": 1 1 1
 $ V18: Factor w/ 1 level "0": 1 1 1
 $ V19: Factor w/ 1 level "0": 1 1 1
 $ V20: Factor w/ 1 level "MIS": 1 1 1
 $ V21: Factor w/ 1 level "MIS": 1 1 1
 $ V22: Factor w/ 1 level "MIS": 1 1 1
 $ V23: Factor w/ 1 level "MIS": 1 1 1
 $ V24: Factor w/ 1 level "MIS": 1 1 1
 $ V25: Factor w/ 1 level "MIS": 1 1 1
 $ V26: Factor w/ 1 level "MIS": 1 1 1
 $ V27: Factor w/ 1 level "MIS": 1 1 1
 $ V28: Factor w/ 1 level "MIS": 1 1 1
 $ V29: Factor w/ 1 level "MIS": 1 1 1
 $ V30: Factor w/ 1 level "MIS": 1 1 1
 $ V31: Factor w/ 1 level "MIS": 1 1 1

其他提示

Another way is

a <- read.table("sample.txt", fill=T);
aseq <- seq(1, dim(a)[1], by=3)
x <- data.frame(a[aseq, 1:7], a[aseq+1,], a[aseq+2,])

The 1:7 is needed because read.table() created NA columns.

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow