Question

Excuse the potential n00bness of this question - still trying to get my head around this non-relational NoSQL stuff.

I've been super impressed with the performance and simplicity of ElasicSearch, but I've got a mapping (borderline NoSQL theroy) question to answer before I dive too deeply into the implementation.

Lets continue to use the Twitter examples ElasticSearch have in their documentation.

Basically, we know a tweet belongs to an user, and a user has many tweets. The objects look something like this:

user  = {'screen_name':'d2kagw', 'id_str':'1234567890', 'favourites_count':'15', ...}
tweet = {'message':'lorem lipsum...', 'user_id_str':'1234567890', ...}

What I'm wondering is, can the tweet object have a reference to the user object? Since I want to be able to write queries like:

{'query': {
  'term':{'message':'lipsum'},
  'range':{'user.favourites_count':{'from':10, 'to':30'}}
}}

Which I would like to return the tweets matching with the user objects as part of the response (vs. having to lazy load them later).

Am I asking too much of it?

Should I be expected to throw all the user data into the tweet object if I want to query the data in that way?

In my implementation (doesn't use twitter, this was just an elegant example) I need to have the two datasets as different indexes due to the various ways I have to query the data, so I'm not sure if I can use an object type AND have the index structure I require.

Thanks in advance for your help.

Was it helpful?

Solution

ElasticSearch doesn't really support table joins that we are so used to in SQL world. The closest it gets to it is Has Child Query that allows limiting results in one table based on a persence of a record in another table and even here it's limited to 1-to-many (parent-children) relationship.

So, a common approach in this world would be to denormalize everything and query one index at a time.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top