I have an imputed dataset that I'm analysing, and I'm trying to draw boxplots, but I can't wrap my head around the proper procedure.

my data (a sample, original has 20 observations per imputation and 13 vars per group, all values range from 0 to 25):

.imp  .id   FTE_RM  FTE_PD  OMZ_RM  OMZ_PD
1     1     25      25      24      24
1     2     4       0       2       6
1     3     11      5       3       2
1     4     12      3       3       3
2     1     20      15      15      15
2     2     4       1       2       3
2     3     0       0       0       6
2     4     20      0       0       0

.imp signifies the imputation round, .id the identifer for each observartion.

I want to draw all the FTE_* variables in a single plot (and the `OMZ_* in another), but wonder what to do with all the imputations, can I just include all values? The imputated data now has 500 observations. With for instance an ANOVA I'd need to average the ANOVA results by 5 to get back to 20 observations. But is this needed for a boxplot as well, since I only deal with medians, means, max. and min.?

Such as:

data_melt <- melt(df[grep("^FTE_", colnames(df))])
ggplot(data_melt, aes(x=variable, y=value))+geom_boxplot()

I've played a couple of times with ggplot, but consider myself a complete newbie.

有帮助吗?

解决方案

I assume you want to keep the identifier for .imp and .id after melting so rather put:

data_melt <- melt(df,c(".imp",".id"))

For completeness of the dataframe it probably helps to introduce a column that identifies the type - FTE vs. OMZ:

data_melt$type <- ifelse(grepl("FTE",data_melt$variable),"FTE","OMZ")

Having this data.frame you can, for example, facet on the type (alternatively you can just use a simple filter statement on data_melt to restrict to one type):

ggplot(data_melt, aes(x=variable, y=value))+geom_boxplot()+facet_wrap(~type,scales="free_x")

This would look like this. EDIT: fixed the data mess-up

enter image description here

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top