Question

I have a phylogenetic tree,which shows genes and how they get clustered together. It was plotted using a Euclidean distance matrix,and ape package. For more details,here is the earlier link.

Phylogenetic tree

Here is my data(gg.txt),which was converted to a gene matrix.

ID  gene1   gene2

1   ADRA1D  ADK
2   ADRA1B  ADK
3   ADRA1A  ADK
4   ADRB1   ASIC1
5   ADRB1   ADK
6   ADRB2   ASIC1
7   ADRB2   ADK
8   AGTR1   ACHE
9   AGTR1   ADK
10  ALOX5   ADRB1
11  ALOX5   ADRB2
12  ALPPL2  ADRB1
13  ALPPL2  ADRB2
14  AMY2A   AGTR1
15  AR  ADORA1
16  AR  ADRA1D
17  AR  ADRA1B
18  AR  ADRA1A
19  AR  ADRA2A
20  AR  ADRA2B

The final code to generate the tree is :

library(ape) 
tab=read.table("gg.txt",header=TRUE, stringsAsFactors=FALSE)
gene.names <- sort(unique(c(tab[,"gene1"],tab[,"gene2"])))
gene.matrix <- cbind(matrix(0L,nrow=length(gene.names),ncol=length(gene.names)))
colnames(gene.matrix) <- c(gene.names)
rownames(gene.matrix)<- c(gene.names)
gene.matrix[as.matrix(tab[-1])] <- 1

##calculating distances

d <- dist(gene.matrix,method="euclidean")
fit <- hclust(d, method="ward")
plot(as.phylo(fit)) 

We can see that there are 4 big clusters that get formed.ALOX5,AR and ALPPL2 form one cluster.ADRA1A,ADRA1B,ADRA1D,AGTR1 form another cluster.Similarly,there are 2 more clusters. Is there any way to put this information in a table,FOR EXAMPLE like below? Is there any software available to do that?

GENE   CLUSTER

ALOX5    1
AR       1
ALPPL2   1
ADRA1A   2
ADRA1B   2
ADRA1D   2
AGTR1    2
..
..
..

I have only shown 20 rows,but I have 21k rows so thats the main concern.

Was it helpful?

Solution

As per @JTT cutree works great!This is what I was looking for.

cut =cutree(fit,k=5)

cut

ACHE    ADK ADORA1 ADRA1A ADRA1B ADRA1D ADRA2A ADRA2B  ADRB1  ADRB2  AGTR1  ALOX5 ALPPL2  AMY2A     AR  ASIC1 
 1      1      1      2      2      2      1      1      3      3      2      4      4      1      5      1 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top