Question

pretend I have a data frame with 4 columns, and a list that contains 3 of those column names

#create data with 4 columns, a-d
a<-c(1,2,3)
b<-c(1,2,3)
c<-c(1,2,3)
d<-c(0.3,0.4,0.2)
data<-data.frame(a,b,c,d)
#create a list that doesnt include d
list<-c('a','b','c')

I want to run a loop where I calculate values based on the sums of those columns, one at a time, and then store this information as a table that gives me the id of each column that was worked with, and the calculated value.

Here is what I have attempted:

#make output vectors for a loop
output.id<-c() 
output.metric<-c()
#run loop
for(i in 1:length(list)){
    #name of which id in the list you are working on
    id<-list[i]
    #compute something based on the data contained within a vector of the data frame, referencing where you are in the list
    metric<- sum(data$list[i]*data$d)/sum(data$list[i])
    #save the name of which id you were working on and the computed value for each element i
    output.id<-c(output.id,id)
    output.metric<-(output.metric,metric)
}

the problem is with the calculation of the metric. I want to call a column of the data based on which list item 'i' I am working on. so, when i=A, i want

metric<- sum(data$list[i]*data$d)/sum(data$list[i])

to be interpreted as

metric<- sum(data$a*data$d)/sum(data$a)

where 'list[i]' is replaced with 'a'

Is there a good way to do this?

Was it helpful?

Solution

The reason your code didn't work is that data$list[i] should be replaced with data[[list[i]]]. However, this whole code could be rewritten in two lines, which will make it both shorter and more efficient. I've changed your variable names so you're not overwriting the list and data functions:

dat <- data.frame(a=1:3, b=1:3, c=1:3, d=c(0.3,0.4,0.2))
lst <- c("a", "b", "c")
output.id <- lst
output.metric <- sapply(lst, function(x) sum(dat[,x]*dat$d)/sum(dat[,x]))
output.metric
#         a         b         c 
# 0.2833333 0.2833333 0.2833333

Another approach would be:

colSums(dat[,lst]*dat$d) / colSums(dat[,lst])
#         a         b         c 
# 0.2833333 0.2833333 0.2833333 

OTHER TIPS

There is a problem with your indexing operation. You use the $ operator, where in this case you should use []. In general, you wouldnt have to use a for loop to achieve this, because many operations in R can be vectorized. But to show you how you could do it with a for loop:

output.id<- numeric(length(list))        #if you have to populate a vector in a for loop, it is good practice to initialize it with the correct or expected length
output.metric<-numeric(length(list))

for(i in 1:length(list)){

  id<-list[i]

  #note the difference in the following line where i use [] instead of $ and id instead of list[i]

  metric<- sum(data[,id]*data$d)/sum(data[,id])

  output.id[i] <-  id              
  output.metric[i] <- metric
}

#this will create a data.frame with results
output <- data.frame(id = output.id, metric = output.metric)

I suggest you read an R tutorial/introduction to learn more about subsetting etc.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top