سؤال

So I am working on a pet project where I'm storing various text files. I have setup my app to save the tags as a string in one of my collections so an example would be:

tags: "Linux Apache WSGI"

Storing them and searching for them work just fine but my question comes when I want to do something like a tag cloud, count all the various tags, or make a dynamic selection system based on tags, what is the best way to break them up to work with? Or should I be storing them some other way?

Logically I could scan through every record and get all the tags, break them based on space, then cache the result somehow. Maybe that's the right answer but I wanted to ask the community wisdom.

I'm using pymongo to interact with my database.

هل كانت مفيدة؟

المحلول

Or should I be storing them some other way?

The standard way to store tags is to store them as an array. In your case, the DB would look something like:

tags: ['linux', 'apached', 'wsgi']

... what is the best way to break them up to work with?

This is what Map/Reduce is designed for. This effectively "scans every record". The output of a Map/Reduce is another collection that you can query.

However, there's also another way to do this and that's to keep "counters" and update them. So when you save a new document you also increment all of the tags related to that document.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top