Question

I am trying to cast data using cast() from the Reshape library, but I am getting unexpected results. I start with a dataframe that has lots of data in it, and all_ia[all_ia$Student.ID == 102050,] returns

66     102050        1      Mar
67     102050        0      Dec
68     102050        1      May
69     102050        0      Feb

Where the variables are Student.ID, Proficiency.Level, and testmonth respectively.

There are some Student.IDs with a 5th month, Sep.

When I run all_ia.cast <- cast(all_ia, Student.ID ~ testmonth, value=c("Proficiency.Level"), fill=c("NA")) and then run all_ia.cast[all_ia.cast$Student.ID == 102050,], I get unexpected results:

1325    102050    1    1    1    1    NA

where the variables are Student.ID, Dec, Feb, Mar, May, Sep respectively. There is a warning when I run cast() which says Aggregation requires fun.aggregate: length used as default.

My question is, why is the fun.aggregate required and why are the Dec and Feb variables in the cast equal to 1 and not 0?

Thank you for your help!

Was it helpful?

Solution

It's because your casting formula Student.Id ~ tesmonth does not contain all of the variables in your data.frame, i.e. Proficiency.Level is not included.

This means, in general, that the casting has to perform an aggregation, and the aggregation formula defaults to length.

You seem to have a special case, where there is a one-to-one relationship between month and proficiency level for each student. Therefore you should choose a aggregation function that preserves the data, e.g. taking the mean The following should work:

cast(all_ia, Student.ID ~ testmonth, value=mean("Proficiency.Level"))

You don't supply test data, so this isn't tested.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top