Compute t.test between several columns conditioned on an other column

https://stackoverflow.com/questions/17791437

03-06-2022
|

Question

I have a df:

df:
     Uttaxeringskassa Delägare.Totalt Delägare.AndelKvinnor Utgifter.SjukhjälpPerMedlem
6877                0             207             31.400966                  10.6908213
3590                1             402                    NA                   5.1019900
3591                1             351             12.432420                   8.2592593
3592                1             378             11.838330                   9.0529101
3593                1             393                    NA                   7.1246819
3594                1             402             16.454333                   7.6791045
3595                1             403                    NA                   6.7890819
3596                0             401                    NA                   5.3341646
3597                0              39             15.384615                   2.2307692
3598                0              39             15.384615                   2.9230769
3599                0              38             13.157895                   0.6315789
3600                0              37             10.810811                   2.9729730
3601                0              35              5.714286                   2.7714286

Dput:

 structure(list(Uttaxeringskassa = c(0, 1, 1, 1, 1, 1, 1, 0, 0, 
0, 0, 0, 0), Delägare.Totalt = c(207, 402, 351, 378, 393, 402, 
403, 401, 39, 39, 38, 37, 35), Delägare.AndelKvinnor = c(31.4009661835749, 
NA, 12.43242, 11.83833, NA, 16.454333, NA, NA, 15.3846153846154, 
15.3846153846154, 13.1578947368421, 10.8108108108108, 5.71428571428571
), Utgifter.SjukhjälpPerMedlem = c(10.6908212560386, 5.10199004975124, 
8.25925925925926, 9.05291005291005, 7.12468193384224, 7.67910447761194, 
6.78908188585608, 5.33416458852868, 2.23076923076923, 2.92307692307692, 
0.631578947368421, 2.97297297297297, 2.77142857142857)), .Names = c("Uttaxeringskassa", 
"Delägare.Totalt", "Delägare.AndelKvinnor", "Utgifter.SjukhjälpPerMedlem"
), row.names = c("6877", "3590", "3591", "3592", "3593", "3594", 
"3595", "3596", "3597", "3598", "3599", "3600", "3601"), class = "data.frame")

I want to calculate a t.test for each column for difference in means, where I group the columns conditioned on value in hh$Uttaxeringskassa.

I am thinking about to first melt df:

hhmelt=melt(hh,id.vars="Uttaxeringskassa",
variable.name="Variables",value.name="Value")

And then calculate a pairwise t-test for difference in means in each column, for all columns.

Any suggestions?

Best Regards

Solution

You should be able to obtain this by just using lapply:

lapply(df[,2:ncol(df)], function(x) t.test(x ~ df$Uttaxeringskassa))

Which will give you a list of the resulting t.test results:

$Delägare.Totalt

        Welch Two Sample t-test

data:  x by df$Uttaxeringskassa
t = -5.0681, df = 6.294, p-value = 0.001991
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -405.4746 -143.4302
sample estimates:
mean in group 0 mean in group 1 
       113.7143        388.1667 


$Delägare.AndelKvinnor

        Welch Two Sample t-test

data:  x by df$Uttaxeringskassa
t = 0.4533, df = 6.37, p-value = 0.6654
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -7.495586 10.963260
sample estimates:
mean in group 0 mean in group 1 
       15.30886        13.57503 


$Utgifter.SjukhjälpPerMedlem

        Welch Two Sample t-test

data:  x by df$Uttaxeringskassa
t = -2.4988, df = 8.246, p-value = 0.03618
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -6.5178601 -0.2783456
sample estimates:
mean in group 0 mean in group 1 
       3.936402        7.334505

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow