Question

Usually, read.table will solve many data input problems personally. Like this one:

China 2 3
USA 1 4

Sometimes, the data can madden people, like:

Chia 2 3
United States 3 4

So the read.table cannot work, and any assistance is appreciated.

P.S. the format of data file is .dat

Was it helpful?

Solution

First set up some test data:

# create test data
cat("Chia 2 3
United States 3 4
", file = "file.with.spaces.txt")

1) Using the above read in the data, insert commas between fields and re-read:

L <- readLines("file.with.spaces.txt")
L2 <- sub("^(.*) +(\\S+) +(\\S+)$", "\\1,\\2,\\3", L) # 1
DF <- read.table(text = L2, sep = ",")

giving:

> DF
             V1 V2 V3
1          Chia  2  3
2 United States  3  4

2) Another approach. Using L from above, replace the last string of spaces with comma twice (since there are three fields):

 L2 <- L
 for(i in 1:2) L2 <- sub(" +(\\S+)$", ",\\1", L2) # 2
 DF <- read.table(text = L2, sep = ",")

ADDED second solution. Minor improvements.

OTHER TIPS

If the column seperator 'sep' is indeed a whitespace, it logically cannot differentiate between spaces in a name and spaces that actually seperate between columns. I'd suggest to change your country names to single strings, ie, strings without spaces. Alternatively, use semicolons to seperate between your data colums and use:

data <- read.table(foo.dat, sep= ";")

If you have many rows in your .dat file, you can consider using regular expressions to find spaces between the columns and replace them with semicolons.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top