Question

I have the following list of data each has 10 samples. The values indicate binding strength of a particular molecule.

What I want so show is that 'x' is statistically different from 'y', 'z' and 'w'. Which it does if you look at X it has more values greater than zero (2.8,1.00,5.4, etc) than others.

I tried t-test, but all of them shows insignificant difference with high P-value.

What's the appropriate test for that?

Below is my code:

#!/usr/bin/Rscript
x   <-c(2.852672123,0.076840264,1.009542943,0.430716968,5.4016,0.084281843,0.065654548,0.971907344,3.325405405,0.606504718)
y   <- c(0.122615039,0.844203734,0.002128992,0.628740077,0.87752229,0.888600425,0.728667099,0.000375047,0.911153571,0.553786408);
z   <- c(0.766445916,0.726801899,0.389718652,0.978733927,0.405585807,0.408554832,0.799010791,0.737676439,0.433279599,0.947906524)
w   <- c(0.000124984,1.486637663,0.979713013,0.917105894,0.660855127,0.338574774,0.211689885,0.434050179,0.955522972,0.014195184)

t.test(x,y)
t.test(x,z)
Was it helpful?

Solution

You have not specified in what way you expect the samples to differ. One typically assumes you mean the mean differs across samples. In that case, the t-test is appropriate. While x has some high values, it also has some low values which pull the mean in. It seems what you thought was a significant difference (visually) is actually a larger variance.

If your question is about variance, then you need an F-test.

OTHER TIPS

The classic test for this type of data is analysis of variance. Analysis of variance tells you if the means of all four categories are the likely the same (failure to reject null hypothesis) or if at least one mean likely differs from the others (rejection of the null hypothesis).

If the anova is significant, you will often want to perform the Tukey HSD post-hoc test to figure out which category differs from the others. Tukey HSD yields p-values that are already adjusted for multiple comparisons.

library(ggplot2)
library(reshape2)

x <- c(2.852672123,0.076840264,1.009542943,0.430716968,5.4016,0.084281843,
       0.065654548,0.971907344,3.325405405,0.606504718)
y <- c(0.122615039,0.844203734,0.002128992,0.628740077,0.87752229,
       0.888600425,0.728667099,0.000375047,0.911153571,0.553786408);
z <- c(0.766445916,0.726801899,0.389718652,0.978733927,0.405585807,
       0.408554832,0.799010791,0.737676439,0.433279599,0.947906524)
w <- c(0.000124984,1.486637663,0.979713013,0.917105894,0.660855127,
       0.338574774,0.211689885,0.434050179,0.955522972,0.014195184)

dat = data.frame(x, y, z, w)
mdat = melt(dat)

anova_results = aov(value ~ variable, data=mdat)
summary(anova_results)
#             Df Sum Sq Mean Sq F value Pr(>F)
# variable     3   5.83  1.9431   2.134  0.113
# Residuals   36  32.78  0.9105               

The anova p-value is 0.113 and the Tukey test p-values for your "x" category are in a similar range. This is the quantification of your intuition that "x" is different from the others. Most researchers would find p = 0.11 to be suggestive but still have too high risk of being a false positive. Note that the large difference in means (diff column) along with the boxplot figure below might be more persuasive than the p-value.

TukeyHSD(anova_results)
#   Tukey multiple comparisons of means
#     95% family-wise confidence level
# 
# Fit: aov(formula = value ~ variable, data = mdat)
# 
# $variable
#            diff       lwr       upr     p adj
# y-x -0.92673335 -2.076048 0.2225815 0.1506271
# z-x -0.82314118 -1.972456 0.3261737 0.2342515
# w-x -0.88266565 -2.031981 0.2666492 0.1828672
# z-y  0.10359217 -1.045723 1.2529071 0.9948795
# w-y  0.04406770 -1.105247 1.1933826 0.9995981
# w-z -0.05952447 -1.208839 1.0897904 0.9990129

plot_1 = ggplot(mdat, aes(x=variable, y=value, colour=variable)) + 
         geom_boxplot() +
         geom_point(size=5, shape=1)
ggsave("plot_1.png", plot_1, height=3.5, width=7, units="in")

enter image description here

In your question you referred to the distributions being different b/c some of them had more values greater than 0. Defining the distributions according to the "number of values greater than 0", then you would use the binomial distribution (after converting the values to 1's and 0's). A function you could then use would be prop.test()

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top