Question

Running e.g. cv.glmnet on a dataset gives me (by default) 100 different models. Now, if my dataset had missing data, I could do multiple imputation (say 10 imputations) and run cv.glmnet on each of the imputations.

If I disregard the actual coefficient values for each of the models, and just look at the selected features (i.e. sets of column names), some models are submodels of others.

Code like this imitates the results somewhat:

usevars<-paste("var", 1:100, sep="")
mdls<-replicate(1000, {
        numVars<-sample.int(length(usevars), 1)
        sample(usevars, numVars)
    })
names(mdls)<-paste("mdl", 1:1000, sep="")

Now, it's easy enough to get the parent-child relations for submodels in this respect. It is also possible to only include 'direct parenthood' (i.e. if model A is child of B and B is child of C, then don't include the relation between A and C).

Finally, I come to my problem: I've used igraph to plot these models and their (direct) relations. I did not, however, find a layout that could group the nodes based on another variable (in this case the model size): in this setting it seems like a good idea to create this graph holding 'bands' of models with the same model size (number of variables in the model).

What I ended up doing, was more or less calculate the positions of each node myself through a kludge of code (that I'm too embarassed about to be posting here), but I always kept wondering if I simply missed a better / out-of-the-box solution.

My own code resulted in graphs like this one (you can ignore the colours and the labels - just know that the horizontal axis holds the model size): enter image description here

Suggestions for achieving this sort of graph more elegantly than, well, doing all the hard work myself, are greatly appreciated.

Was it helpful?

Solution

The Fruchterman-Reingold layout algorithm in the development version of igraph (that is, 0.6, which is not released officially yet, but you can ask Gábor on the mailing list to send you a copy) has two hidden (i.e. yet undocumented) parameters: miny and maxy. They allow you to constrain the Y coordinates of nodes within a range, so you can use this to create layers.

Alternatively, I'm working on an implementation of the Sugiyama layered graph layout method for igraph right now and I will merge it to the development tree in a day or two (if things go well), and then you can try that.

OTHER TIPS

You can use the option to constrain the fruchterman-reingold algorithm in qgraph to do this. To show this I first create a small adjacency matrix of nested models:

adj <- matrix(0,9,9)

adj[1,2:4] <- 1
adj[2:4,5:7] <- 1
adj[5:7,8] <- 1
adj[8,9] <- 1 

mod <- c(1,rep(2,3),rep(3,3),4,5)

Here adj is the adjacency matrix and mods a vector containing the level of the model (how far it is nested).

In qgraph You can plot the graph of an adjacency matrix using the qgraph() function on the adjacency matrix. By setting the argument layout="spring" you call the Fruchterman-Reingold algortihm, and with layout.par you can supply a list of parameters for Fruchterman-Reingold.

With the parameter constraints we can set constraints to the layout. This must be a matrix of 2 columns and a row for each node. The first element of each row is the x-coordinate and the second the y-coordinate. If this contains NA it means that that coordinate is free to move, and if this is a value it means that that coordinate is fixed to a certain location.

You'd have to try different things on the scale of the y positions to see what works best. Here I just multiply the mod vector by the number of nodes to get a good looking graph:

library("qgraph")
Lpar <- list(constraints = cbind(NA,nrow(adj)*mod))
L <- qgraph(adj,layout="spring",layout.par=Lpar)$layout

Here we also saved the layout in an object L, which can be used as layout in igraph as well.

The model

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top