سؤال

Usually, read.table will solve many data input problems personally. Like this one:

China 2 3
USA 1 4

Sometimes, the data can madden people, like:

Chia 2 3
United States 3 4

So the read.table cannot work, and any assistance is appreciated.

P.S. the format of data file is .dat

هل كانت مفيدة؟

المحلول

First set up some test data:

# create test data
cat("Chia 2 3
United States 3 4
", file = "file.with.spaces.txt")

1) Using the above read in the data, insert commas between fields and re-read:

L <- readLines("file.with.spaces.txt")
L2 <- sub("^(.*) +(\\S+) +(\\S+)$", "\\1,\\2,\\3", L) # 1
DF <- read.table(text = L2, sep = ",")

giving:

> DF
             V1 V2 V3
1          Chia  2  3
2 United States  3  4

2) Another approach. Using L from above, replace the last string of spaces with comma twice (since there are three fields):

 L2 <- L
 for(i in 1:2) L2 <- sub(" +(\\S+)$", ",\\1", L2) # 2
 DF <- read.table(text = L2, sep = ",")

ADDED second solution. Minor improvements.

نصائح أخرى

If the column seperator 'sep' is indeed a whitespace, it logically cannot differentiate between spaces in a name and spaces that actually seperate between columns. I'd suggest to change your country names to single strings, ie, strings without spaces. Alternatively, use semicolons to seperate between your data colums and use:

data <- read.table(foo.dat, sep= ";")

If you have many rows in your .dat file, you can consider using regular expressions to find spaces between the columns and replace them with semicolons.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top