Pergunta

I want to have a matrix from this data frame. The values should be on the basis if there is a relation between a pair of gene then 1, and if not then 0. So ADRA1D and ADK would have value 1, and so would other pairs. But there is no pair of ADK and AR so in that matrix it should be 0.

tab <- read.table(text="ID  gene1   gene2
1   ADRA1D  ADK
2   ADRA1B  ADK
3   ADRA1A  ADK
4   ADRB1   ASIC1
5   ADRB1   ADK
6   ADRB2   ASIC1
7   ADRB2   ADK
8   AGTR1   ACHE
9   AGTR1   ADK
10  ALOX5   ADRB1
11  ALOX5   ADRB2
12  ALPPL2  ADRB1 
13  ALPPL2  ADRB2
14  AMY2A   AGTR1
15  AR  ADORA1
16  AR  ADRA1D
17  AR  ADRA1B
18  AR  ADRA1A
19  AR  ADRA2A
20  AR  ADRA2B", header=TRUE, stringsAsFactors=FALSE)

Primarily, I want to build a phylogenetic tree, so was thinking of having a matrix like that. How can I use reshape library for this, since I have no value column?

The below code does not work:

library(reshape)
ct=cast(tab,gene1~gene2)
Foi útil?

Solução

If it is not mandatory to use reshape I'd suggest taking a look at igraph. Here's one way to get the symmetrical matrix using the igraph package. We first convert your data frame (the relevant 2 columns) into an igraph object, and then get_adjacency does the needful.

library(igraph)
g <- graph.data.frame(tab[,c(2,3)])
get.adjacency(g)

This gives you the adjacency matrix. You should definitely look into using igraph for the rest of your analysis.

16 x 16 sparse Matrix of class "dgCMatrix"
   [[ suppressing 16 column names ‘ADRA1D’, ‘ADRA1B’, ‘ADRA1A’ ... ]]

ADRA1D . . . . . . . . . . 1 . . . . .
ADRA1B . . . . . . . . . . 1 . . . . .
ADRA1A . . . . . . . . . . 1 . . . . .
ADRB1  . . . . . . . . . . 1 1 . . . .
ADRB2  . . . . . . . . . . 1 1 . . . .
AGTR1  . . . . . . . . . . 1 . 1 . . .
ALOX5  . . . 1 1 . . . . . . . . . . .
ALPPL2 . . . 1 1 . . . . . . . . . . .
AMY2A  . . . . . 1 . . . . . . . . . .
AR     1 1 1 . . . . . . . . . . 1 1 1
ADK    . . . . . . . . . . . . . . . .
ASIC1  . . . . . . . . . . . . . . . .
ACHE   . . . . . . . . . . . . . . . .
ADORA1 . . . . . . . . . . . . . . . .
ADRA2A . . . . . . . . . . . . . . . .
ADRA2B . . . . . . . . . . . . . . . .

An advantage of using igraph is that many graph-based distance calculation methods are now available for you. Do look into shortest.paths

Outras dicas

You can achieve this with the table function :

> table(tab$gene1, tab$gene2)

         ACHE ADK ADORA1 ADRA1A ADRA1B ADRA1D ADRA2A ADRA2B ADRB1 ADRB2 AGTR1 ASIC1
  ADRA1A    0   1      0      0      0      0      0      0     0     0     0     0
  ADRA1B    0   1      0      0      0      0      0      0     0     0     0     0
  ADRA1D    0   1      0      0      0      0      0      0     0     0     0     0
  ADRB1     0   1      0      0      0      0      0      0     0     0     0     1
  ADRB2     0   1      0      0      0      0      0      0     0     0     0     1
  AGTR1     1   1      0      0      0      0      0      0     0     0     0     0
  ALOX5     0   0      0      0      0      0      0      0     1     1     0     0
  ALPPL2    0   0      0      0      0      0      0      0     1     1     0     0
  AMY2A     0   0      0      0      0      0      0      0     0     0     1     0
  AR        0   0      1      1      1      1      1      1     0     0     0     0

Use as.matrix if you want a matrix structure.

EDIT ## : For a symetric matrix.

To generate a symetric matrix when you use table you need that the two arguments have the same levels, here the values aren't factors but strings then there is no levels but it's the same thing. You need at least one occurence of each unique gene1 in gene2 and vice versa.

For that I suggest you to create a vector with all your genes (I used sort(unique(c(unique(tab$gene1), unique(tab$gene2))))).

I merged "gene1" with this vector keeping all the occurences with no correspondances, it will produces NA instead of join with something. Same thing for "gene2".

Now you have all at least one of each gene possible in "gene1" and "gene2" and you can table.

genes <- c('ACHE','ADK','ADORA1','ADRA1A','ADRA1B','ADRA1D','ADRA2A','ADRA2B','ADRB1','ADRB2','AGTR1','ALOX5','ALPPL2','AMY2A','AR','ASIC1')

df <- merge(tab, as.data.frame(genes), by.x = "gene1", by.y = "genes", all = TRUE)
df <- merge(df, as.data.frame(genes), by.x = "gene2", by.y = "genes", all = TRUE)

> table(df$gene1, df$gene2)

         ACHE ADK ADORA1 ADRA1A ADRA1B ADRA1D ADRA2A ADRA2B ADRB1 ADRB2 AGTR1 ALOX5 ALPPL2 AMY2A AR ASIC1
  ACHE      0   0      0      0      0      0      0      0     0     0     0     0      0     0  0     0
  ADK       0   0      0      0      0      0      0      0     0     0     0     0      0     0  0     0
  ADORA1    0   0      0      0      0      0      0      0     0     0     0     0      0     0  0     0
  ADRA1A    0   1      0      0      0      0      0      0     0     0     0     0      0     0  0     0
  ADRA1B    0   1      0      0      0      0      0      0     0     0     0     0      0     0  0     0
  ADRA1D    0   1      0      0      0      0      0      0     0     0     0     0      0     0  0     0
  ADRA2A    0   0      0      0      0      0      0      0     0     0     0     0      0     0  0     0
  ADRA2B    0   0      0      0      0      0      0      0     0     0     0     0      0     0  0     0
  ADRB1     0   1      0      0      0      0      0      0     0     0     0     0      0     0  0     1
  ADRB2     0   1      0      0      0      0      0      0     0     0     0     0      0     0  0     1
  AGTR1     1   1      0      0      0      0      0      0     0     0     0     0      0     0  0     0
  ALOX5     0   0      0      0      0      0      0      0     1     1     0     0      0     0  0     0
  ALPPL2    0   0      0      0      0      0      0      0     1     1     0     0      0     0  0     0
  AMY2A     0   0      0      0      0      0      0      0     0     0     1     0      0     0  0     0
  AR        0   0      1      1      1      1      1      1     0     0     0     0      0     0  0     0
  ASIC1     0   0      0      0      0      0      0      0     0     0     0     0      0     0  0     0

Hope this help, this is probably not the best way to do it though.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top