Question

Three Measurements (Time) are nested in Networkpartners (NP) which are nested in Persons (ID). The variable NP.T (created according to the answer mentioned here) indicates the number of Networkpartners (with no missing value on the Outcome) a specific person (ID) has on a specific measurement (1 to 3).

This is an example for my dataset, the real one has thousands of rows though.

   ID NP   Time Outcome  NP.T
1   1 11    1       4    2
2   1 12    1       2    2
3   1 11    2       3    2
4   1 12    2       3    2
5   1 11    3      NA    1
6   1 12    3       3    1
7   2 21    1       2    2
8   2 22    1       4    2
9   2 21    2      NA    1
10  2 22    2       4    1
11  2 21    3      NA    1
12  2 22    3       4    1

I want to calculate the following things and donot know how to do this properly:

a) Mean, SD for the number of networkpartners at each measurement (NP.T).

Also I'm interested in the number of Persons (IDs) that named at least one Networkpartner at each measurement.

T1 -> 2 IDs named at least one Networkpartner

T2 -> 2 IDs named at least one NP

T3 -> 2 IDs named at least one NP

It may sound trivial in this example, but it's not in my sample. for the computation of the means, sds ect. for each time i want to take into account only those IDs who actually named at least one networkpartner at that specific time. IDs who didn't name any NP at that specific time shouldn't be a part of the descriptive statistics for that time point. For clarification: If there is an NA on the output-variable it means, that that NP hadn't been named by it's ID at that timepoint.

b) reapeated measures ANOVA to find out if the mean number of networkpartners changes over the time

some Expected Results:

Mean.T1 = 2 <- as both IDs had named two NPs at T1

Mean.T2 = 1.5 <- as one ID had named two and the other one NP at T2

Mean.T3 = 1 <- as both IDs had named one NP at T3

n.T1 = 2

n.T2 = 2

n.T3 = 2

The problem is, that in the real dataset all persons named different amounts of Networkpartners, so i don't know how to calculate the descriptive statistics in this case.

Was it helpful?

Solution

Part A(1 & 2)

library(plyr)
mydata3<-ddply(mydata1,.(Time),summarize,mean=mean(NP.T),sd=sd(NP.T),nobs=length(unique(ID)))


 > mydata3
  Time mean        sd nobs
1    1  2.0 0.0000000    2
2    2  1.5 0.5773503    2
3    3  1.0 0.0000000    2

Part B:

myaov <- aov(mean ~ Time, data=mydata3)

> myaov

    Call:
       aov(formula = mean ~ Time, data = mydata3)

    Terms:
                    Time Residuals
    Sum of Squares   0.5       0.0
    Deg. of Freedom    1         1

    Residual standard error: 1.17148e-16 
    Estimated effects may be unbalanced

Uodated: For the error Error in is.list(by) : 'by' is missing, please check here for details. As mentioned in the website, this problem is not a problem of RStudio, but a problem of the Hmisc library masking the function summarize from the package 'plyr'.

So, you need to add library(Hmisc)in the earlier code.

library(plyr)
library(Hmisc)
    mydata3<-ddply(mydata1,.(Time),summarize,mean=mean(NP.T),sd=sd(NP.T),nobs=length(unique(ID)))
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top