I have some data to reshape in R but can not figure out how. Here is the scenario: I have test scores data from a number of students from different schools. Here is some example data:

#Create example data: 
test <- data.frame("score" = c(1,10,20,40,20), "schoolid" = c(1,1,2,2,3))

Resulting in a data format like this:

  score schoolid
    1        1
   10        1
   20        2
   40        2
   20        3

So, there is aschool id which identifies the school and there is a test score for each student. For an analysis in a different program, I would like to have the data in a format like this:

                Score student 1    Score student 2 
School ID == 1        1                   10               
School ID == 2       10                   40
School ID == 3       20                   NA

To reshape the data, I tried to use the reshape and the cast function from the reshape2 library, but this resulted in errors:

#Reshape function
reshape(test, v.names = test2$score, idvar = test2$schoolid, direction = "wide")
reshape(test, idvar = test$schoolid, direction = "wide")
#Error: in [.data.frame'(data,,idvar): undefined columns selected

#Cast function
cast(test,test$schoolid~test$score)
#Error: Error: could not find function "cast" (although ?cast works fine)

I guess that the fact that there number of test scores is different for each school complicates the restructuring process.

How I can reshape this data and which function should I use ?

有帮助吗?

解决方案

Here are some solutions that only use the base of R. All three solutions use this new studentno variable:

studentno <- with(test, ave(schoolid, schoolid, FUN = seq_along))

1) tapply

with(test, tapply(score, list(schoolid, studentno), c))

giving:

   1  2
1  1 10
2 20 40
3 20 NA

2) reshape

# rename score to student and append studentno column
test2 <- transform(test, student = score, score = NULL, studentno = studentno)
reshape(test2, dir = "wide", idvar = "schoolid", timevar = "studentno")

giving:

  schoolid student.1 student.2
1        1         1        10
3        2        20        40
5        3        20        NA

3) xtabs xtabs would also work if there are no students with a score of 0.

xt <- xtabs(score ~ schoolid + studentno, test)
xt[xt == 0] <- NA  # omit this step if its ok to use 0 in place of NA
xt

giving:

        studentno
schoolid  1  2
       1  1 10
       2 20 40
       3 20   

其他提示

You have to define the student id somewhere, for example:

test <- data.frame("score" = c(1,10,20,40,20), "schoolid" = c(1,1,2,2,3))
test$studentid <- c(1,2,1,2,1)

library(reshape2)
dcast(test, schoolid ~ studentid, value.var="score",mean)
  schoolid  1   2
1        1  1  10
2        2 20  40
3        3 20 NaN
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top