Domanda

I am on the lookout for a function in R that would check for the presence of particular columns, e.g.

cols=c("a","b","c","d")

in a matrix or dataframe that would insert a column with NAs in case any columns did not exist (in the position in which the columns are given in vector cols). Say if you had a matrix or dataframe with named columns "a", "d", that it would insert a column "b" and "c" filled up with NAs before column "d", and that any columns not listed in cols would be deleted (e.g. column "e"). What would be the easiest and fastest way to achieve this (I am dealing with a fairly large dataset of ca. 1 million rows)? Or is there already some function that does this?

È stato utile?

Soluzione

I would separate the creation step and the ordering step. Here is an example:

cols <- letters[1:4]
## initialize test data set
my.df <- data.frame(a = rnorm(100), d = rnorm(100), e = rnorm(100))
## exclude columns not in cols
my.df <- my.df[ , colnames(my.df) %in% cols]
## add missing columns filled with NA
my.df[, cols[!(cols %in% colnames(my.df))]] <- NA
## reorder
my.df <- my.df[, cols]

Altri suggerimenti

Other approach I also just discovered using match, but only works for matrices:

# original matrix
matrix=cbind(a = 1:2, d = 3:4)
# required columns
coln=c("a","b","c","d")

colnmatrix=colnames(matrix)
matrix=matrix[,match(coln,colnmatrix)]
colnames(matrix)=coln
matrix
     a  b  c d
[1,] 1 NA NA 3
[2,] 2 NA NA 4

Another possibility if your data is in a matrix

# original matrix
m1 <- cbind(a = 1:2, d = 3:4)
m1
#      a d
# [1,] 1 3
# [2,] 2 4

# matrix will all columns, filled with NA
all.cols <- letters[1:4]
m2 <- matrix(nrow = nrow(m1), ncol = length(all.cols), dimnames = list(NULL, all.cols))
m2
#       a  b  c  d
# [1,] NA NA NA NA
# [2,] NA NA NA NA

# replace columns in 'NA matrix' with values from original matrix
m2[ , colnames(m1)] <- m1
m2
#      a  b  c d
# [1,] 1 NA NA 3
# [2,] 2 NA NA 4
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top