Question

I decided to try out CouchDB today, using the ~9GB of Amazon Review data here: http://snap.stanford.edu/data/web-Movies.html

What I am trying to do is find the least helpful users of all time. The people who have written the largest number of reviews which other people find unhelpful (are they Amazon's greatest trolls? Or just disagreeable? I want to see).

I've written a map function to find the userID for all users that have a difference of helpfulness rating of over 5, then a reduce function to sum them, to find out how often they appear.

// map function:
function(doc){
  var unhelpfulness = doc.helpfulness[1] - doc.helpfulness[0]   
  if(unhelpfulness > 5){
    emit(doc.userId, 1);
  }
}

// reduce function:
function(keys, values){
  return sum(values);
}

This gives me a view of userId : number of unhelpful reviews.

I want to take this output and then reprocess it with more map reduce, to find out who's written the most unhelpful reviews. How do I do this? Can I export a view as another table or something? Or am I just thinking about this problem in the wrong way?

Was it helpful?

Solution

You are on the right track. Couch db does not allow the results to be sorted by value but it has a list function that can be used to perform operations on the results of the view. From the couchdb book

Just as show functions convert documents to arbitrary output formats, CouchDB list functions allow you to render the output of view queries in any format. The powerful iterator API allows for flexibility to filter and aggregate rows on the fly, as well as output raw transformations for an easy way to make Atom feeds, HTML lists, CSV files, config files, or even just modified JSON.

So we will use list to filter and aggregate. In your design document create a list function like so

function(head, req) 
{

var row; var rows=[]; 

while(row=getRow()){rows.push(row); } 

rows.sort(function(a,b){return b.value -a.value}); 

send(JSON.stringify(rows[0]));  
}

now if you query

/your-database/_design/your-design-doc-name/your-list-name/your-view-name?group=true

You should have the name of the person who has the most unhelpful review. Couch db makes it easy to find a troll :)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top