How to delete certain nodes from a regression tree built by `ctree()` from `party` package

StackOverflow https://stackoverflow.com/questions/23535424

  •  17-07-2023
  •  | 
  •  

Question

I've built a regression tree using ctree() from package party. The results of my model have many nodes which contain equal probability of dependent variables (E.g. : class A = 0.33, class B = 0.33, Class C = 0.33). I want to take out these nodes from the model. The package tree has the snip.tree() command where we can specify the node numbers to be deleted from the model.This command does not recognize regression trees built with ctree(). Please let me know if there is a way to delete certain nodes from a regression tree built using ctree()

I have used the model:

rv.mod1 <- ctree(ldclas ~ L2 + L3 + L4 + L5 + L6 + ele + ndvi + nd_var + nd_ps, data = rv, controls = ctree_control(minsplit = 0, minbucket = 0))
pr.rv.mod1 <- snip.tree(rv.mod1, nodes = nn2.rv.mod1$nodes)

nn2.rv.mod1$nodes is a vector with nodes to be deleted from the rv.mod1 model.But I get an error:

Error in snip.tree(rv.mod1, nodes = nn2.rv.mod1$nodes) : 
  not legitimate tree
Was it helpful?

Solution

I don't think there is direct way to do this, but I will propose a "hack" using the weights argument in ctree.

Let's start with a reproducible example

library(party)
irisct <- ctree(Species ~ .,data = iris)
plot(irisct)

enter image description here

Now, suppose you want to get rid of node number 5. You can do the following

NewWeigths <- rep(1, dim(iris)[1]) # Setting a weights vector which will be passed into the `weights` attribute in `ctree`
Node <- 5 # Selecting node #5
n <- nodes(irisct, Node)[[1]] # Retrieving the weights of that node
NewWeigths[which(as.logical(n$weights))] <- 0 # Setting these weigths to zero, so `ctree` will disregard them
irisct2 <- ctree(Species ~ .,data = iris, weights = NewWeigths) # creating the new tree with new weights
plot(irisct2)

enter image description here

Note how nodes 2, 6 and 7 (now they are named 2, 4 and 5 because we have less splits) remained exactly with the same distribution and splitting conditions.

I didn't test it for all nodes, but it seem to work fairly well

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top