Question

I have a very large .TSV file that I cannot read into R due to its size.

I want to read in only select columns BY HEADER NAME, eg. "HEALTH".

How can I go about doing this?

Was it helpful?

Solution

Have a look at the colClasses argument of read.table:

df <- read.table(header = TRUE, colClasses=c(NA, "NULL", NA), text = '
                 A B C
                 1 2 3
                 4 5 6')
df
#  A C
#1 1 3
#2 4 6

Update:

To select by names first read in the header and then create a vector for colClasses:

# read the header
header <- read.table(header = FALSE, nrow = 1, text = '
                 A B C
                 1 2 3
                 4 5 6')

# cols we want to select
take <- c('A', 'B')
# create vector for colClasses
takecols <- ifelse(t(header) %in% take, NA, 'NULL')

# read selected cols
df <- read.table(header = TRUE, colClasses=takecols, text = '
                 A B C
                 1 2 3
                 4 5 6')
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top