Data Mining: grouping based on two text values (IDs) and one numeric (ratio)

https://stackoverflow.com/questions/18705223

28-06-2022
|

문제

For a music project I want to find what which groups of artists users listens to. I have extracted three columns from the database: the ID of the artist, the ID of the user, and the percentage of all the users stream that is connected to that artist. E.g. Half of the plays from user 15, is of the artist 12.

12 | 15 | 0.5

What I hope to find is a methodology to group clusters of groups together, so e.g. find out that users who tends to listen to artist 12 also listens to 65, 74, and 34.

I wonder what kind of methodologies that can be used for this grouping, and if there are any good sources for this approach (Python or Ruby would be great).

해결책 2

Sounds like a classic matrix factorization task to me.

With a weighted matrix, instead of a binary one. So some fast algorithms may not be applicable, because they support binary matrixes only.

Don't ask for source on Stackoverflow: asking for off-site resources (tools, libraries, ...) is off-topic.

다른 팁

Imagine your data as a matrix with users as rows and artists as columns, with each cell containing the ratio.

A straight forward analysis would be to use clustering on the (possible very large) column vectors. Check out the python library scikit-learn. I can also recommend using IPython notebook for interactive data analysis.

Your problem is known as "market-basket analysis" or "affinity correlation", check out Best Python clustering library to use for product data analysis

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow