Question

I have a bunch of articles, on which I want to do word frequency and trend analysis.

The articles are tagged with date, author, theme and subject. I want to use these tags to slice the data so that I can get the most common words used for a specific author (or group of authors), theme(s) or subject(s). Overall and over time (trend).

How would I design this database (relational or other) or should I create a data cube?

Était-ce utile?

La solution

Rizzoma.com made this with couchDB (noSQL) and Sphinx (fulltext search engine). You can try to make it in another way, if you want, or test existing solution and repeat it.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top