Question

I have the following heatmap with the dendrogram like this.

The complete data are here.

The problem is that the dendrogram on the left is squished. How can I unsquish (expand) it without altering the column size of the heatmap?

enter image description here

It is generated with this code:

#!/usr/bin/Rscript
library(gplots);
library(RColorBrewer);


plot_hclust  <- function(inputfile,clust.height,type.order=c(),row.margins=70) {

    # Read data
    dat.bcd <- read.table(inputfile,na.strings=NA, sep="\t",header=TRUE);


    rownames(dat.bcd) <- do.call(paste,c(dat.bcd[c("Probes","Gene.symbol")],sep=" "))
    dat.bcd <- dat.bcd[,!names(dat.bcd) %in% c("Probes","Gene.symbol")] 
    dat.bcd <- dat.bcd

    # Clustering and distance function
    hclustfunc <- function(x) hclust(x, method="complete")
    distfunc <- function(x) dist(x,method="maximum")


    # Select based on FC, as long as any of them >= anylim

    anylim <- 2.0
    dat.bcd <- dat.bcd[ apply(dat.bcd, 1,function(x) any (x >= anylim)), ]


    # Clustering functions
    height <- clust.height; 

    # Define output file name
    heatout <- paste("tmp.pafc.heat.",anylim,".h",height,".pdf",sep="");


    # Compute distance and clusteirn function
    d.bcd <- distfunc(dat.bcd)
    fit.bcd <- hclustfunc(d.bcd)


    # Cluster by height
    #cutree and rect.huclust has to be used in tandem
    clusters <- cutree(fit.bcd, h=height) 
    nofclust.height <-  length(unique(as.vector(clusters)));

    myorder <- colnames(dat.bcd); 
    if (length(type.order)>0) {
     myorder <- type.order
    }

    # Define colors
    #hmcols <- rev(brewer.pal(11,"Spectral"));
    hmcols <- rev(redgreen(2750));
    selcol <- colorRampPalette(brewer.pal(12,"Set3"))
    selcol2 <- colorRampPalette(brewer.pal(9,"Set1"))
    sdcol= selcol(5);
    clustcol.height = selcol2(nofclust.height);

    # Plot heatmap
    pdf(file=heatout,width=20,height=50); # for FC.lim >=2
    heatmap.2(as.matrix(dat.bcd[,myorder]),Colv=FALSE,density.info="none",lhei=c(0.1,4),dendrogram="row",scale="row",RowSideColors=clustcol.height[clusters],col=hmcols,trace="none", margin=c(30,row.margins), hclust=hclustfunc,distfun=distfunc,lwid=c(1.5,2.0),keysize=0.3);
    dev.off();


}
#--------------------------------------------------
# ENd of functions 
#-------------------------------------------------- 

plot_hclust("http://pastebin.com/raw.php?i=ZaGkPTGm",clust.height=3,row.margins=70);
Was it helpful?

Solution

In your case the data has long tail, which is expected for gene expression data (lognormal).

data <- read.table(file='http://pastebin.com/raw.php?i=ZaGkPTGm', 
                   header=TRUE, row.names=1)

mat <- as.matrix(data[,-1]) # -1 removes the first column containing gene symbols

As you can see from the quantile distribution that the genes with the highest expression extend the range from 1.5 to above 300.

quantile(mat)

#     0%     25%     50%     75%    100% 
#  0.000   0.769   1.079   1.544 346.230 

When the hierarchical clustering is performed on unscaled data the resulting dendrogram may show bias towards the values with the highest expression, as seen in your example. This merits either a logarithmic or z-score transformation, among many (reference). Your dataset contains values == 0, which is a problem for log-transformation since log(0) is undefined.

Z-score transformation (reference) is implemented within heatmap.2, but it's important to note that the function computes the distance matrix and runs clustering algorithm before scaling the data. Hence the option scale='row' doesn't influence the clustering results, see my earlier post (differences in heatmap/clustering defaults in R) for more details.

I would propose that you scale your data before running heatmap.2:

# scale function transforms columns by default hence the need for transposition.
z <- t(scale(t(mat))) 

quantile(z)

#         0%        25%        50%        75%       100% 
# -2.1843994 -0.6646909 -0.2239677  0.3440102  2.2640027 

# set custom distance and clustering functions
hclustfunc <- function(x) hclust(x, method="complete")
distfunc <- function(x) dist(x,method="maximum")

# obtain the clusters
fit <- hclustfunc(distfunc(z))
clusters <- cutree(fit, 5) 

# require(gplots)
pdf(file='heatmap.pdf', height=50, width=10)
heatmap.2(z, trace='none', dendrogram='row', Colv=F, scale='none', 
             hclust=hclustfunc, distfun=distfunc, col=greenred(256), symbreak=T,
             margins=c(10,20), keysize=0.5, labRow=data$Gene.symbol,
             lwid=c(1,0.05,1), lhei=c(0.03,1), lmat=rbind(c(5,0,4),c(3,1,2)),
             RowSideColors=as.character(clusters))
dev.off()

Also, see the additional posts here and here, which explain how to set the layout of the heatmap via lmat, lwid and lhei parameters.

The resulting heatmap is shown below (row and column labels are omitted):

enter image description here

OTHER TIPS

As far as I can tell, you may have some outliers in your data set (the objects at the very bottom). Try the following:

  1. remove outliers from the data set
  2. logscale your distances, to give less emphasis to extreme values

Zscore transformation is easy with 'fheatmap' package by using 'scale' parameter. Check out 'fheatmap' package. Height of the dendograms can be expanded by increasing the with of the canvas(pdf). http://cran.r-project.org/web/packages/fheatmap/index.html

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top