Question

I was wondering if you kind folks could answer a question I have. In the sample data I've provided below, in column 1 I have a categorical variable, and in column 2 p-values.

x <- c(rep("A",0.1*10000),rep("B",0.2*10000),rep("C",0.65*10000),rep("D",0.05*10000))
categorical_data=as.matrix(sample(x,10000))
p_val=as.matrix(runif(10000,0,1))
combi=as.data.frame(cbind(categorical_data,p_val))
head(combi)

  V1                V2
1  A 0.484525170875713
2  C  0.48046557046473
3  C 0.228440979029983
4  B 0.216991128632799
5  C 0.521497668232769
6  D 0.358560319757089

I want to now take one of the categorical variables, let's say "C", and create another variable if it is C (print 1 in column 3, or 0 if it isn't).

combi$NEWVAR[combi$V1=="C"] <-1
combi$NEWVAR[combi$V1!="C" <-0

  V1                V2 NEWVAR
1  A 0.484525170875713 0
2  C  0.48046557046473 1
3  C 0.228440979029983 1
4  B 0.216991128632799 0
5  C 0.521497668232769 1
6  D 0.358560319757089 0

I'd like to do this for each of the variables in V1, and then loop over using lapply:

variables=unique(combi$V1)

loopeddata=lapply(variables,function(x){
combi$NEWVAR[combi$V1==x] <-1
combi$NEWVAR[combi$V1!=x]<-0
}
)

My output however looks like this:

[[1]]
[1] 0

[[2]]
[1] 0

[[3]]
[1] 0

[[4]]
[1] 0

My desired output would be like the table in the second block of code, but when looping over the third column would be A=1, while B,C,D=0. Then B=1, A,C,D=0 etc.

If anyone could help me out that would be very much appreciated.

Was it helpful?

Solution

How about something like this:

model.matrix(~ -1 + V1, data=combi)

Then you can cbind it to combi if you desire:

combi <- cbind(combi, model.matrix(~ -1 + V1, data=combi))

OTHER TIPS

model.matrix is definitely the way to do this in R. You can, however, also consider using table.

Here's an example using the result I get when using set.seed(1) (always use a seed when sharing example problems with random data).

LoopedData <- table(sequence(nrow(combi)), combi$V1)
head(LoopedData)
#    
#     A B C D
#   1 0 1 0 0
#   2 0 0 1 0
#   3 0 0 1 0
#   4 0 0 1 0
#   5 0 1 0 0
#   6 0 0 1 0

## If you want to bind it back with the original data
combi <- cbind(combi, as.data.frame.matrix(LoopedData))

head(combi)
#   V1                 V2 A B C D
# 1  B 0.0647124934475869 0 1 0 0
# 2  C  0.676612401846796 0 0 1 0
# 3  C  0.735371692571789 0 0 1 0
# 4  C  0.111299667274579 0 0 1 0
# 5  B 0.0466546178795397 0 1 0 0
# 6  C  0.130910312291235 0 0 1 0
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top