Going from report to feature matrix
-
16-10-2019 - |
Question
I am starting to play around in datamining / machine learning and I am stuck on a problem that's probably easy.
So I have a report that lists the url and the number of visits a person did. So a combination of ip and url result in an amount of visits.
Now I want to run the k-means clustering algorithm on this so I thought I could approach it like this:
This is my data:
url ip visits
abc.be 123 5
abc.be/a 123 2
abc.be/b 123 2
abc.be/b 321 4
And I would turn in into a feature vector/matrix like so:
abc.be abc.be/a abc.be/b impressions
1 0 0 5
0 1 0 2
0 0 1 2
0 0 1 4
But I am stuck on how to transform my data set to a feature matrix. Any help would be appreciated.
Solution
I don't understand what you mean by
So I have a report that lists the url and the number of visits a person did. So a combination of ip and url result in an amount of visits.
Assuming that you equate an IP with a user, and you wish to cluster users by their URL visitation frequencies, your matrix, M
, would have
- One row per IP (user)
- One column for each URL that you are tracking (your features)
- and the entries in
M
would be "visits" of a given URL by a particular IP
Given these assumptions, and your report, M
would be:
abc.be abc.be/a abc.be/b
123 5 2 2
321 0 0 4