Question

I have a data set that looks like this:

target,items
1,[i1,i3]
1,[i4,i5,i9]
0,[i1]
...

The variable target is 0-1 outcome. The feature "items" is a set of items (variable length). Each item is a categorical variable (one of: i1, i2, .., i_N). There's no order/relationship between the items. A business example would be "set of products in a cart, outcome whether the customer abandons cart".

The size of data is approx. 1,000,000 by 5,000 (I have ~1 million examples, and N is approximately 5,000)

I want to do the following analysis. I want to find the items that influence (or lead to) target = 1. I don't have extra features to add. What is the type of statistical analysis or machine learning modelling technique that I should use?

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top