Question

I'm using ElasticSearch as a data store and I'm wondering about how to structure my data. Coming from MySQL my natural instinct is to split everything into different types ("tables"), but I'm unsure if there is anything to be gained from it.

For example, I have an article with comments in it and I want to keep track of users who have clicked "like" on the comment. Should I simply keep the array of user ids in a nested array inside the comments of the article, or should I move the comments out into a separate comment type? And what about the array of users who have liked the comment, should that be a separate type as well?

{
    "article": {
        "properties": {
            ...
            "comments": {
                "properties": {
                    ...
                    "likes": { "type": "string" } // array of UUIDs
                }
            }
}

Is there a problem with having nested arrays inside nested arrays from an efficiency perspective? And is it better to use nested arrays/objects or separate types when using ElasticSearch as a data store?

Was it helpful?

Solution

This is kind of a broad question, and the usual answer is "it depends". I would say there are two main things you need to consider when planning the structure for your data.

One is your access pattern - what are the types of searches you are going to need, and what kind of aggregations (if any) you will want on your data. Try to map your uses to see that you can achieve it with the structure you have in mind.

The second is the update pattern. This is sometime overlooked in favor of the access pattern, but there are important implications worth considering. For instance, if the article itself doesn't change much but it can have a lot of comments - you might get better performance keeping comments as a separate document (and type) since you don't need to reindex your article on each comment. (Remember that updating a document in Elasticsearch is actually re-indexing it).

I also recommend looking at this article - http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/ and getting familiar with the difference between nested objects and parent-child types (the latter are better when you have different update patterns for the parent and the child).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top