سؤال

I have a bunch of articles, on which I want to do word frequency and trend analysis.

The articles are tagged with date, author, theme and subject. I want to use these tags to slice the data so that I can get the most common words used for a specific author (or group of authors), theme(s) or subject(s). Overall and over time (trend).

How would I design this database (relational or other) or should I create a data cube?

هل كانت مفيدة؟

المحلول

Rizzoma.com made this with couchDB (noSQL) and Sphinx (fulltext search engine). You can try to make it in another way, if you want, or test existing solution and repeat it.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top