Question

I have a regression related question, but I am not sure how to proceed. Consider the following dataset, with A, B, C, and D as the attributes (features) and a decision variable Dec for each row:

  A   B   C   D   Dec
  a1  b1  c1  d1  Y
  a1  b2  c2  d2  N
  a2  b2  c3  d2  N
  a2  b1  c3  d1  N
  a1  b3  c2  d3  Y
  a1  b1  c1  d2  N
  a1  b1  c4  d1  Y

Given such data, I want to figure out most compact rules for which Dec evaluates to Y. For example, A=a1 AND B=b1 AND D=d1 => Y.

I would prefer specifying the thresholds for the Precision of these rules, so that I can filter them out as per my requirement. For example, I would like to see all rules which provide at least 90% precision. This can provide me better compaction of the rules. The above mentioned rule provides 100% precision, whereas B=b1 AND D=d1 => Y has 66% precision (it errs on the 4th row).

Vaguely, I can see that this is similar to building a decision tree and finding out paths which end in Y. If I understand correctly, building a regression model would provide me the attributes which matter the most, but I need combinations of actual values from the attributes which lead to Y.

The attribute values are multi-valued, but that is not a hard constraint. I can even assume them to be boolean.

Is there any library in existing tools such as Weka or R that can help me?

Regards

Was it helpful?

Solution

I don't think this is a regression problem. This seems like a classification problem where you are trying to classify Y or N. You could build ensemble learners like Adaboost and see how the decisions vary from tree to tree or you could do something like elastic net logistic regression and see what the final weights are.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top