Going from report to feature matrix

https://datascience.stackexchange.com/questions/5214

16-10-2019
|

Question

I am starting to play around in datamining / machine learning and I am stuck on a problem that's probably easy.

So I have a report that lists the url and the number of visits a person did. So a combination of ip and url result in an amount of visits.

Now I want to run the k-means clustering algorithm on this so I thought I could approach it like this:

This is my data:

url      ip    visits

abc.be   123   5
abc.be/a 123   2
abc.be/b 123   2
abc.be/b 321   4

And I would turn in into a feature vector/matrix like so:

abc.be  abc.be/a   abc.be/b   impressions
   1       0          0          5
   0       1          0          2
   0       0          1          2
   0       0          1          4

But I am stuck on how to transform my data set to a feature matrix. Any help would be appreciated.

Solution

I don't understand what you mean by

So I have a report that lists the url and the number of visits a person did. So a combination of ip and url result in an amount of visits.

Assuming that you equate an IP with a user, and you wish to cluster users by their URL visitation frequencies, your matrix, M, would have

One row per IP (user)
One column for each URL that you are tracking (your features)
and the entries in M would be "visits" of a given URL by a particular IP

Given these assumptions, and your report, M would be:

    abc.be  abc.be/a  abc.be/b
123   5        2         2
321   0        0         4

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange