Pregunta

Here is my problem. I need to implement a multi target decision tree algorithm. A multi target is an extension of multi label learning where the labels are not binary but can be continuous, categorical and so on. For example a label vector for a multi label classification problem could look like this {1,0,1,0,0,0,1}, while for a multi target could look like this {2,35,3,-2,24}. My problem is this. If i have a label that takes 3 discrete values how do i represent them in a vector? Lets say i have a label called job and takes 3 values, mechanic,teacher and athlete. How can i code this label in order to use it in a vector? At each node in a decision tree in order to find my split, i need to compute the mean vector of all the label vectors in this node ( i am using the variance method equation to find my split). If i had binary label this would be easy because adding 0s and 1s doesn't pose any problem. If i code these 3 jobs with 0,1,2, then this is problem because adding a label vector that has the label athlete, counts more than adding a vector that has the job mechanic and the mean vector is inaccurate.

Lets take this example. I have these 3 labels:

          job: {mechanic,teacher,athlete}
          married:{yes,no}
          age:  continuous value

It is easy to say that the married label can be coded as {0,1} and the age label as a continuous number. But how can i code the job label? Coding it as {0,1,2} causes the next problem. Imagine 2 label vectors in a node: {0,0,45} which corresponds to mechanic,married and 45 years old and {2,1,48} which corresponds to athlete,not married,45 years old. The mean vector is {1,0.5,46.5}. With this vector i can predict that the age of the instance that falls in to that node is 46.5, i can say that the instance in not married (with a rule that says greater or equal than 0.5 is 1) and i can say that its job is a teacher. The teacher job is totally wrong while the others are OK. You see now the problem of coding categorical labels. An help or advice??? Thanks :D

¿Fue útil?

Solución

How about taking all your discrete values of a feature and transform them all into features if values more than 2, for example:

job: {mechanic, teacher, athlete}
married:{yes, no}
age:  continuous value

will result in an 5-dimensional vecor

(mechanic 0/1, teacher 0/1, athlete 0/1, married 0/1, age 0-inf)

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top