Question

Let's say I would like to build a system modelling the behaviour of visitors to a city.

For argument's sake, the city has 5 places of interest: A, B, C, D, and E. All are equally likely to be the first place to visit, and all are within easy reach of one another.

I am interested in drawing conclusions resembling the following:

  • "Users who visit C commonly go on to visit B."
  • "Users who visit A hardly ever go on to visit D."
  • "Users who visit B are equally likely to visit C and E."

My problems as I understand them are as follows:

  1. I don't know anything about graph theory. (But I am prepared to read up on it).
  2. I'm uncertain of the best way to store this kind of data. If not an SQL DB, what?
  3. What sort of operations am I going to be performing on the data I end up with? Could I use a general-purpose language like Ruby?

Thank you for any guidance.

Was it helpful?

Solution

The type of storage obviously depends on the sort of data you have. If it's just what you describe here then you can represent each journey as a string:

ABCB
DCDE
...

This well fits in a database, but of course such a list can be stored using any means, whatever is most easily available to you. You probably don't even need the entire list, an accumulated version might be sufficient, where you store each string exactly once, along with its count:

ABDC  177
DEA   2996
...

For such a table a database is appropriate, but its still simple enough to be stored in a plain file.

For examining the data you don't care about graph theory, rather read up on statistics and machine learning. The first thing you want to analyze is the correlation of the various places. You can do that using simple string operations, e.g. count the substrings "AD" to find out how often people go from A to D. And regarding the language: You want to calculate and visualize correlations, so maybe you pick something where that kind of stuff isn't too hard. This could be something specialized like Matlab or R, or something more general like Python/Matplotlib/scikit-learn. I don't know about Ruby.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top