Question

I found a nice example of plotting convex hull shapes using ggplot with ddply here: Drawing outlines around multiple geom_point groups with ggplot

I thought I'd try something similar--create something like an Ashby Diagram--to practice with the data.table package:

test<-function()
{
library(data.table)
library(ggplot2)

set.seed(1)

Here I define a simple table:

dt<-data.table(xdata=runif(15),ydata=runif(15),level=rep(c("a","b","c"),each=5),key="level")

And then I define the hull positions by level:

hulls<-dt[,as.integer(chull(.SD)),by=level]
setnames(hulls,"V1","hcol")

So then my thought was to merge hulls with dt, so that I could eventually manipulate hulls to get in the proper form for ggplot (shown below for reference):

ashby<-ggplot(dt,aes(x=xdata,y=ydata,color=level))+
        geom_point()+
        geom_line()+
        geom_polygon(data=hulls,aes(fill=level))
}

But it seems that any way I try to merge hulls and dt, I get an error. For example, merge(hulls,dt) produces the error as shown in footnote 1.

This seems like it should be simple, and I'm sure I'm just missing something obvious. Any direction to a similar post or thoughts on how to prep hull for ggplot is greatly appreciated. Or if you think that it's best to stick with the ddply approach, please let me know.

Example undesired output:

test<-function(){
    library(data.table)
    library(ggplot2)
    dt<-data.table(xdata=runif(15),ydata=runif(15),level=rep(c("a","b","c"),each=5),key="level")
    set.seed(1)
    hulls<-dt[,as.integer(chull(.SD)),by=level]
    setnames(hulls,"V1","hcol")
    setkey(dt, 'level') #setting the key seems unneeded
    setkey(hulls, 'level')
    hulls<-hulls[dt, allow.cartesian = TRUE]
    ggplot(dt,aes(x=xdata,y=ydata,color=level))+
            geom_point()+
            geom_polygon(data=hulls,aes(fill=level))
}

results in a mess of criss-crossing polygons: undesired output

Footnote 1:

Error in vecseq(f__, len__, if (allow.cartesian) NULL else as.integer(max(nrow(x), : Join results in 60 rows; more than 15 = max(nrow(x),nrow(i)). Check for duplicate key values in i, each of which join to the same group in x over and over again. If that's ok, try including j and dropping by (by-without-by) so that j runs for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and datatable-help for advice.

Was it helpful?

Solution

Here is what you want to do. Generating some random data:

library(ggplot2)
library(data.table)
# You have to set the seed _before_ you generate random data, not after
set.seed(1) 
dt <- data.table(xdata=runif(15), ydata=runif(15), level=rep(c("a","b","c"), each=5),
  key="level")

Here is where the magic happens:

hulls <- dt[, .SD[chull(xdata, ydata)], by = level]

Plotting the result:

ggplot(dt,aes(x=xdata,y=ydata,color=level)) +
    geom_point() +
    geom_polygon(data = hulls,aes(fill=level,alpha = 0.5))

produces

enter image description here

It works because chull returns a vector of indexes that need to be selected from the data to form a convex hull. We then subset each individual data frame with .SD[...], and data.table joins them together by level.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top