How to build “Tagging” support using CouchDB?

https://stackoverflow.com/questions/211118

03-07-2019
|

Question

I'm using the following view function to iterate over all items in the database (in order to find a tag), but I think the performance is very poor if the dataset is large. Any other approach?

def by_tag(tag):
return  '''
        function(doc) {
            if (doc.tags.length > 0) {
                for (var tag in doc.tags) {
                    if (doc.tags[tag] == "%s") {
                        emit(doc.published, doc)
                    }
                }
            }
        };
        ''' % tag

Solution

Disclaimer: I didn't test this and don't know if it can perform better.

Create a single perm view:

function(doc) {
  for (var tag in doc.tags) {
    emit([tag, doc.published], doc)
  }
};

And query with _view/your_view/all?startkey=['your_tag_here']&endkey=['your_tag_here', {}]

Resulting JSON structure will be slightly different but you will still get the publish date sorting.

OTHER TIPS

You can define a single permanent view, as Bahadir suggests. when doing this sort of indexing, though, don't output the doc for each key. Instead, emit([tag, doc.published], null). In current release versions you'd then have to do a separate lookup for each doc, but SVN trunk now has support for specifying "include_docs=True" in the query string and CouchDB will automatically merge the docs into your view for you, without the space overhead.

You are very much on the right track with the view. A list of thoughts though:

View generation is incremental. If you're read traffic is greater than you're write traffic, then your views won't cause an issue at all. People that are concerned about this generally shouldn't be. Frame of reference, you should be worried if you're dumping hundreds of records into the view without an update.

Emitting an entire document will slow things down. You should only emit what is necessary for use of the view.

Not sure what the val == "%s" performance would be, but you shouldn't over think things. If there's a tag array you should emit the tags. Granted if you expect a tags array that will contain non-strings, then ignore this.

# Works on CouchDB 0.8.0
from couchdb import Server # http://code.google.com/p/couchdb-python/

byTag = """
function(doc) {
if (doc.type == 'post' && doc.tags) {
    doc.tags.forEach(function(tag) {
        emit(tag, doc);
    });
}
}
"""

def findPostsByTag(self, tag):
    server = Server("http://localhost:1234")
    db = server['my_table']
    return [row for row in db.query(byTag, key = tag)]

The byTag map function returns the data with each unique tag in the "key", then each post with that tag in value, so when you grab key = "mytag", it will retrieve all posts with the tag "mytag".

I've tested it against about 10 entries and it seems to take about 0.0025 seconds per query, not sure how efficient it is with large data sets..

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow