Correlate an array of categorical features to binary outcome
-
31-10-2019 - |
Question
I have a data set that looks like this:
target,items
1,[i1,i3]
1,[i4,i5,i9]
0,[i1]
...
The variable target is 0-1 outcome. The feature "items" is a set of items (variable length). Each item is a categorical variable (one of: i1, i2, .., i_N). There's no order/relationship between the items. A business example would be "set of products in a cart, outcome whether the customer abandons cart".
The size of data is approx. 1,000,000 by 5,000 (I have ~1 million examples, and N is approximately 5,000)
I want to do the following analysis. I want to find the items that influence (or lead to) target = 1. I don't have extra features to add. What is the type of statistical analysis or machine learning modelling technique that I should use?
No correct solution
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange