Microarray Limma package, in topTable function don't assign ID for probsets column

StackOverflow https://stackoverflow.com/questions/22970546

  •  30-06-2023
  •  | 
  •  

Frage

I tried a tutorial by Daniel Swan ,it works perfectly well. But I'm facing a problem in topTable function of limma package.

The "topTable" function create a "probeset list" but this probset list have not "ID" header (other columns name is their sample name, but Probe list column have not name (ID)).

At the result, when I am runing:

gene.symbols <- getSYMBOL(probeset.list$ID, "hgu133plus2")

I'm getting the following error

  Error in .select(x, keys, columns, keytype = extraArgs[["kt"]], jointype = jointype): 
      'keys' must be a character vector

topTable is:

               logFC  AveExpr        t      P.Value    adj.P.Val        B
204779_s_at 7.367790 4.171707 72.77347 3.284937e-15 8.969850e-11 20.25762
207016_s_at 6.936667 4.027733 57.39252 3.694641e-14 5.044293e-10 19.44987
209631_s_at 5.192949 4.003992 51.24892 1.170273e-13 1.065182e-09 18.96660

my expression Set achieved by simpleaffy (gcrma) package. I'm runing R 3.0.2 under windows 7 with latest bioconductor packages, simpleaffy_2.38.0 , limma_3.18.13 and anotation files: hgu133plus2.db_2.10.1 ,hgu133plus2probe_2.13.0, hgu133plus2cdf_2.13.0

I would be very thankful, if somebody could help me.

War es hilfreich?

Lösung

The IDs are not stored as an ID column, but as the rownames of the table. Change the line to:

gene.symbols <- getSYMBOL(rownames(probeset.list), "hgu133plus2")

If you want there to be an ID column instead of using row names, you can assign one with:

probeset.list$ID = rownames(probeset.list)

According to the documentation of the toptable function, the ID column will exist if and only if there are duplicated gene names:

 If ‘fit’ had unique rownames, then the row.names of the above
 data.frame are the same in sorted order. Otherwise, the row.names
 of the data.frame indicate the row number in ‘fit’. If ‘fit’ had
 duplicated row names, then these are preserved in the ‘ID’ column
 of the data.frame, or in ‘ID0’ if ‘genelist’ already contained an
 ‘ID’ column.

In the other examples you've seen ID used, there must have been duplicate gene names in the input. This makes sense because R typically doesn't like having duplicated rownames (but has no problem having duplicate IDs in a column).

Andere Tipps

Hope my piece of working codes can make your question clear:

library(limma) # загружаем нужную библиотека
library(siggenes)
library(cluster)
library(stats)

data <- read.table("AneurismDataAllProbesGenesisLog2NormalizedExperAndGenes.tab", sep = "\t", header = TRUE) # read from file

q = as.matrix(data) # данные в матрицу

b = as.matrix(cbind(data[, 2:10], data[, 11:14])) # cмежные колонки данных
m = normalizeQuantiles(b, ties=TRUE)
f = data.frame(condition = c(0,0,0,0,0,0,0,0,0,1,1,1,1)) # дизайн
fit = lmFit(m, f) # линейная модель
e = eBayes(fit) # тест Байеса
volcanoplot(e, coef=1, highlight=5, names=data$GeneName, xlab="Log Fold Change", ylab="Log Odds", pch=19, cex=0.67, col = "dark blue") # график-вулкан
z = rownames(m) = data[, 1]
hc <- hclust(dist(m), "ave") # кластерграмма
plot(hc)
plot(hc, hang = -1)  

print(e$coefficients) # output eBayes coefficients
print(e$p.value) # get out the P values
toptable(e) # select 10 most differentialy expressed genes, the disadvantage that it outputs only the gene row number and not the name
printresult <-toptable(e) # assign the result to a variable
write.csv(printresult, file = "eBayesTableAneurism", row.names = TRUE) # write to the file in the current folder
 volcanoplot(e, coef=1, highlight=10, names=data[,1], xlab="Log Fold Change", ylab="Log Odds", pch=19, cex=0.67, col = "red") # график-вулкан c именами
volcanoplot(e, coef=1, highlight=5, names=data[,1], xlab="Log Fold Change", ylab="Log Odds", pch=19, cex=0.67, col = "blue") # график-вулкан с именами (Volcano with gene names)
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top