Question

I want to make a simple phylogenetic tree for a marine biology course as an educative example. I have a list of species with taxonomic rank:

    Group <- c("Benthos","Benthos","Benthos","Benthos","Benthos","Benthos","Zooplankton","Zooplankton","Zooplankton","Zooplankton",
"Zooplankton","Zooplankton","Fish","Fish","Fish","Fish","Fish","Fish","Phytoplankton","Phytoplankton","Phytoplankton","Phytoplankton")
Domain <- rep("Eukaryota", length(Group))
Kingdom <- c(rep("Animalia", 18), rep("Chromalveolata", 4))
Phylum <- c("Annelida","Annelida","Arthropoda","Arthropoda","Porifera","Sipunculida","Arthropoda","Arthropoda","Arthropoda",
"Arthropoda","Echinoidermata","Chorfata","Chordata","Chordata","Chordata","Chordata","Chordata","Chordata","Heterokontophyta",
"Heterokontophyta","Heterokontophyta","Dinoflagellata")
Class <- c("Polychaeta","Polychaeta","Malacostraca","Malacostraca","Demospongiae","NA","Malacostraca","Malacostraca",
"Malacostraca","Maxillopoda","Ophiuroidea","Actinopterygii","Chondrichthyes","Chondrichthyes","Chondrichthyes","Actinopterygii",
"Actinopterygii","Actinopterygii","Bacillariophyceae","Bacillariophyceae","Prymnesiophyceae","NA")
Order <- c("NA","NA","Amphipoda","Cumacea","NA","NA","Amphipoda","Decapoda","Euphausiacea","Calanioda","NA","Gadiformes",
"NA","NA","NA","NA","Gadiformes","Gadiformes","NA","NA","NA","NA")                     
Species <- c("Nephtys sp.","Nereis sp.","Gammarus sp.","Diastylis sp.","Axinella sp.","Ph. Sipunculida","Themisto abyssorum","Decapod larvae (Zoea)",
"Thysanoessa sp.","Centropages typicus","Ophiuroidea larvae","Gadus morhua eggs / larvae","Etmopterus spinax","Amblyraja radiata",
"Chimaera monstrosa","Clupea harengus","Melanogrammus aeglefinus","Gadus morhua","Thalassiosira sp.","Cylindrotheca closterium",
"Phaeocystis pouchetii","Ph. Dinoflagellata")   
dat <- data.frame(Group, Domain, Kingdom, Phylum, Class, Order, Species)
dat

I would like to get a dendrogram (cluster analysis) and use Domain as the first cutting point, Kindom as the second, Phylum as the third, etc. Missing values should be ignored (no cutting point, a straight line instead). Group should be used as a coloring category for the labels.

I am a bit uncertain how to make a distance matrix from this data frame. There are a lot of phylogenetic tree packages for R, they seem to want newick data / DNA / other advanced information. Thus help with this would be appreciated.

Was it helpful?

Solution

It's probably a bit lame to answer my own question, but I found an easier solution. Maybe it helps someone one day.

library(ape)
taxa <- as.phylo(~Kingdom/Phylum/Class/Order/Species, data = dat)

col.grp <- merge(data.frame(Species = taxa$tip.label), dat[c("Species", "Group")], by = "Species", sort = F)

cols <- ifelse(col.grp$Group == "Benthos", "burlywood4", ifelse(col.grp$Group == "Zooplankton", "blueviolet", ifelse(col.grp$Group == "Fish", "dodgerblue", ifelse(col.grp$Group == "Phytoplankton", "darkolivegreen2", ""))))

plot(taxa, type = "cladogram", tip.col = cols)

Note that all columns have to be factors. This demonstrates the work flow with R. It takes a week to find out something, although the code itself is just a couple of rows =)

enter image description here

OTHER TIPS

If you wanted to draw the tree by hand (this is probably not the best way to do it), you could start as follows (it is not a complete answer: the colours are missing, and the edges are too long). This assumes that the data has already been sorted.

# Data: remove Group
dat <- data.frame(Domain, Kingdom, Phylum, Class, Order, Species)

# Start a new plot
par(mar=c(0,0,0,0))
plot(NA, xlim=c(0,ncol(dat)+1), ylim=c(0,nrow(dat)+1), 
  type="n", axes=FALSE, xlab="", ylab="", main="")

# Compute the position of each node and find all the edges to draw
positions <- NULL
links <- NULL
for(k in 1:ncol(dat)) {
  y <- tapply(1:nrow(dat), dat[,k], mean)
  y <- y[ names(y) != "NA" ]
  positions <- rbind( positions, data.frame(
    name = names(y),
    x = k,
    y = y
  ))
}
links <- apply( dat, 1, function(u) { 
  u <- u[ !is.na(u) & u != "NA" ]
  cbind(u[-length(u)],u[-1]) 
} )
links <- do.call(rbind, links)
rownames(links) <- NULL
links <- unique(links[ order(links[,1], links[,2]), ])

# Draw the edges
for(i in 1:nrow(links)) {
  from <- positions[links[i,1],]
  to   <- positions[links[i,2],]
  lines( c(from$x, from$x, to$x), c(from$y, to$y, to$y) )
}

# Add the text
text(positions$x, positions$y, label=positions$name)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top