Fill = T won't work with single letters (?) [R]

https://stackoverflow.com/questions/22791762

r
fill

25-06-2023
|

Вопрос

I'm using 'fill = T' on a file that has single letters separated by commas:

    Pred
1   T,T
2   NA
3   D
4   NA
5   NA
6   T
7   P,B
8   NA
9   NA

using the command:

sift <- read.table("/home/pred.txt", header=F, fill=TRUE, sep=',', stringsAsFactors=F)

Which I was hoping the sift will turn out as:

    V1 V2
1    T  T
2 <NA>    
3    D    
4 <NA>   
5 <NA>   
6    T   
7    P  B
8 <NA>   
9 <NA>

However, it comes out like:

    V1 
1    T 
2 <NA>    
3    D    
4 <NA>   
5 <NA>   
6    T   
7    P 
8 <NA>   
9 <NA>

This code works when there are multiple sampleIDs (separated by a comma) in each row - but not for single letters. Does 'fill' work for single letters? Stupid question, I know.

Решение

So here is a workaround:

url  <- "https://dl.dropboxusercontent.com/s/bjb241s16t63ev8/pred.txt?dl=1&token_hash=AAEBzfCGgoeHgNTvhMSVoZK6qRGrdwwuDZB3h8lWTZNtkA"
df.1 <- read.table(url,header=F,sep=",",fill=T,stringsAsFactors=F)
dim(df.1)
# [1] 149792      1     <-- 149,792 rows and ** 1 ** column

df.2 <- read.table(url,header=F,sep=",",fill=T,stringsAsFactors=F, 
                   col.names=c("V1","V2"))
dim(df.2)
# [1] 149633      2     <-- 149,633 rows and ** 2 ** columns

head(df.2[which(nchar(df.2$V2)>0),])
#      V1 V2
# 1000  T  T
# 2419  T  T
# 3507  T  T
# 3766  T  D
# 4308  T  D
# 4545  T  D

read.table(...) creates a data frame with number of columns determined by the first 5 rows. Since the first 5 rows in your file have only 1 column, that's what you get. Evidently, by specifying sep="," you force read.table(...) to add the "extra" data as extra rows.

The workaround explicitly sets the number of columns by specifying column names, which could be anything, as long as length(col.names) = 2.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow