Question

I have a bunch of articles, on which I want to do word frequency and trend analysis.

The articles are tagged with date, author, theme and subject. I want to use these tags to slice the data so that I can get the most common words used for a specific author (or group of authors), theme(s) or subject(s). Overall and over time (trend).

How would I design this database (relational or other) or should I create a data cube?

Was it helpful?

Solution

Rizzoma.com made this with couchDB (noSQL) and Sphinx (fulltext search engine). You can try to make it in another way, if you want, or test existing solution and repeat it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top