Domanda

I have a bunch of articles, on which I want to do word frequency and trend analysis.

The articles are tagged with date, author, theme and subject. I want to use these tags to slice the data so that I can get the most common words used for a specific author (or group of authors), theme(s) or subject(s). Overall and over time (trend).

How would I design this database (relational or other) or should I create a data cube?

È stato utile?

Soluzione

Rizzoma.com made this with couchDB (noSQL) and Sphinx (fulltext search engine). You can try to make it in another way, if you want, or test existing solution and repeat it.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top