문제

I have a bunch of articles, on which I want to do word frequency and trend analysis.

The articles are tagged with date, author, theme and subject. I want to use these tags to slice the data so that I can get the most common words used for a specific author (or group of authors), theme(s) or subject(s). Overall and over time (trend).

How would I design this database (relational or other) or should I create a data cube?

도움이 되었습니까?

해결책

Rizzoma.com made this with couchDB (noSQL) and Sphinx (fulltext search engine). You can try to make it in another way, if you want, or test existing solution and repeat it.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top