Jaccard cluster confidence interval

https://stackoverflow.com/questions/22642964

21-06-2023
|

Question

I am hoping someone will give me advice on how to get confidence intervals from a Jaccard cluster using R. I have species data from the 1970's and from today at four sites. When I run the following code I get a great graph that shows that one of my present day sites is closer to the historical than another site. I'm sure people are going to ask about significance. I've seen similar confidence intervals on phylogenetic trees but I'm not sure how to get these kind of results. I assume I do this with a bootstrap test but I'm not sure how to get results from boot() or how to put them on my cluster graph. Any advice would be greatly appreciated.

My code to make the cluster:

historicalwo <-read.csv("/users/Victoria/Desktop/Stat Documents/historicalwo.csv", 
row.names =   1)

 jaccard2 <- vegdist (historical, method = "jaccard")

 plot (hclust (jaccard2), hang = -1,main = "Sites clustered by Jaccard similarity",axes = FALSE,
 ylab = "")

then I made a .csv of the jaccard results with 3 columns, site 1, site 2 and the jaccard index of the two sites.

jaccardboot <-read.csv("/users/Victoria/Desktop/Stat Documents/jaccardboot.csv", header = TRUE)

bs <- function(formula, data, indices) {
d <- data[indices,] 
fit <- lm(formula, data=d)
return(coef(fit)) } 

results <- boot(data=jaccardboot,statistic=bs,
            R=100, formula=site1~jaccard+site2)

results

I get: Error in boot(data = jaccardboot, statistic = bs, R = 100, formula = site1 ~ : number of items to replace is not a multiple of replacement length In addition: There were 50 or more warnings (use warnings() to see the first 50)

Solution

Through a stroke of luck I stumbled upon a reasonable answer to my questions. First I transposed my data and then I used pvclust using Ward method and binary as the distance. This simulates a jaccard index. The results did not cluster like my previous example but at least now I have statistical significance. If anyone knows why this cluster might differ from my jaccard cluster I am all ears.

    swo <-read.csv("/users/Victoria/Desktop/Stat Documents/siteswo1.csv", header = TRUE, row.names = 1)
    result <- pvclust(swo, method.dist="binary", method.hclust="ward", nboot=1000)
    plot(result)
    pvrect(result, alpha=0.95)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow