Domanda

I've got a relatively simple problem, which I don't think I'm properly approaching using R.

I have a data frame with several observations, stored in rows, as well as a bunch of annotations that I don't want to lose, in other columns of the same data frame.

I would like to run a t-test across the values in several columns of the data frame, and have the results written to (ideally) the same data frame.

A simple example would be:

# Generate the data
experimentName <- paste(rep("name",20), c(1:20), sep="")
experimentAnno1 <- rep(paste(rep("anno",5), c(1:5), sep=""), 4)
a1 <- rnorm(n=20, mean=10, sd=5)
a2 <- rnorm(n=20, mean=11, sd=5)
a3 <- rnorm(n=20, mean=12, sd=5)
b1 <- rnorm(n=20, mean=20, sd=5)
b2 <- rnorm(n=20, mean=21, sd=5)
b3 <- rnorm(n=20, mean=19, sd=5)

sampledata <- cbind(experimentName, experimentAnno1, a1,a2,a3,b1,b2,b3)

So I've tried a very simple

ttestfun = function(x) t.test(x[,c("a1", "a2", "a3")], x[,c("b1", "b2", "b3")])$p.value
p.value = apply(sampledata, 1, ttestfun)

Which doesn't work :(

I've also tried a whole bunch of combinations of by(), melt(), apply() etc - all of which I think I'm doing somehow wrong.

The outcome I'm hoping to get is additional columns in the sampledata data frame which are:

# pValue
p.value
# LoConf
a$conf.int[1]
# UpConf
a$conf.int[2]

etc.

What is the most efficient way to do this?

Thanks in advance!

È stato utile?

Soluzione

You'll need to make sampledata a data.frame first, to get numeric values in the "a" and "b" columns.

> sampledata <- data.frame(experimentName, experimentAnno1, a1,a2,a3,b1,b2,b3)

If you are trying to get per-row statistics based on a Welch two-sample t-test, this way is fast and relatively simple.

> stats <- as.data.frame(do.call(rbind, lapply(1:nrow(sampledata), function(i){
    as.numeric(unlist(t.test(sampledata[i, 3:5], sampledata[i, 6:8]))[1:5])
    })))
> names(stats) <- c("t.stat", "param.df", "p.val", "ci.left", "ci.right")
> cbind(sampledata, stats)

Altri suggerimenti

Probably not the most efficient, but here's one way that builds on your initial effort.

Your example data:

experimentName <- paste(rep("name",20), c(1:20), sep="")
experimentAnno1 <- rep(paste(rep("anno",5), c(1:5), sep=""), 4)
a1 <- rnorm(n=20, mean=10, sd=5)
a2 <- rnorm(n=20, mean=11, sd=5)
a3 <- rnorm(n=20, mean=12, sd=5)
b1 <- rnorm(n=20, mean=20, sd=5)
b2 <- rnorm(n=20, mean=21, sd=5)
b3 <- rnorm(n=20, mean=19, sd=5)

I use data.frame rather than cbind so we can keep the numbers as numerics (cbind coerces them to character)

# sampledata <- cbind(experimentName, experimentAnno1, a1,a2,a3,b1,b2,b3)
sampledata <- data.frame(experimentName, experimentAnno1, a1,a2,a3,b1,b2,b3)

Seems like your goal is to within each row, test set of a1, a2, a3, against set of b1, b2, b3

Here are some lapply functions that get those values:

sampledata$pvalue <- sapply(1:nrow(sampledata), function(i) t.test(sampledata[i,c("a1", "a2", "a3")], sampledata[i,c("b1", "b2", "b3")])$p.value)

sampledata$LoConf <- sapply(1:nrow(sampledata), function(i) t.test(sampledata[i,c("a1", "a2", "a3")], sampledata[i,c("b1", "b2", "b3")])$conf.int[1])

sampledata$UpConf <- sapply(1:nrow(sampledata), function(i) t.test(sampledata[i,c("a1", "a2", "a3")], sampledata[i,c("b1", "b2", "b3")])$conf.int[2])
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top