Question

I am storing Book Titles in elasticsearch and they all belong to many shops. Like this:

{
    "books": [
        {
            "id": 1,
            "title": "Title 1",
            "store": "store1" 
        },
        {             
            "id": 2,
            "title": "Title 1",
            "store": "store2" 
        },
        {             
            "id": 3,
            "title": "Title 1",
            "store": "store3" 
        },
        {             
            "id": 4,
            "title": "Title 2",
            "store": "store2" 
        },
        {             
            "id": 5,
            "title": "Title 2",
            "store": "store3" 
        }
    ]
}

How can I get all the books and group them by title... and one result per group (one row with group with the same title so i can get all ids and stores)?

Based on data above I want to get two results with all ids and stores in them.

Expected results:

{
"hits":{
    "total" : 2,
    "hits" : [
        {                
            "0" : {
                "title" : "Title 1",
                "group": [
                     {
                         "id": 1,
                         "store": "store1"
                     },
                     {
                         "id": 2,
                         "store": "store2"
                     },
                     {
                         "id": 3,
                         "store": "store3"
                     },
                ]
            }
        },
        {                
            "1" : {
                "title" : "Title 2",
                "group": [
                     {
                         "id": 4,
                         "store": "store2"
                     },
                     {
                         "id": 5,
                         "store": "store3"
                     }
                ]
            }
        }
    ]
}
}
Was it helpful?

Solution

What you are looking for is not possible in Elasticsearch, at least not with the current version (1.1).

There is a long outstanding issue for this feature with a lot of +1's and demand behind it.

As for statements: Simon says, it requires a lot of refactoring and although it is planned, there is no way of saying, when it will be implemented or even shipped.

A similar statement was made by Clinton Gormley in his webinar, that field grouping needs a lot of effort to be done right, especially since Elasticsearch is a sharded and distributed environment by nature. It would be not that big of a deal, if you'd ignore sharding, but Elasticsearch wants to ship only with features, that can scale with the complete system and work as well on hundreds of machines as they would on a single box.

If you're not tied to Elasticsearch, Solr offers such a feature.

Otherwise, probably the best solution at the moment is to do this client side. That is, query for some documents, do the grouping on you client and if needed, fetch some more results to satisfy your desired group size (as far as i know, this is what Solr is doing under the hood).

Not exactly what you wanted, but you could also go for aggregations; create one bucket for your title and have a sub-aggregation done on the id field. You won't get the store values with this, but you could retrieve them from your datastore once you have the ids.

{
    "aggs" : {
        "titles" : {
            "terms" : { "field" : "title" },
            "aggs": {
                "ids": {
                    "terms": { "field" : "id" }
                }
            }
        }
    }
}

Edit: It seems, that with the top_hits aggregations, result grouping could be implemented soon.

OTHER TIPS

You can implement above desired result using Aggregation in aggregation with top_hits aggs. ex.

aggs: {
        "set": {
            "terms": {
                field: "id"
            },
            "aggs": {
                "color": {
                    "terms": {
                        field: "color"
                    },
                    "aggs": {
                        "products": {
                            "top_hits": {
                                _source:{
                                    "include":["size"]
                                }
                            }
                        }
                    }
                },
                "product": {
                    "top_hits": {
                        _source:{
                            "include":["productDetails"]
                        },
                        size: 1
                    }
                }
            }
        }
    }

On the similar lines with SQL'S GROUP BY Elasticsearch provides aggregation

With aggregation queries, Elasticsearch responsds with Buckets.

One bucket corresponds to one category (group).

I have the same problem but the best solution that I have found is change the mapping. You can convert the mapping to that the field "store" will be of type nested. This is because you have an relation many to many. In that way you can apply sorting, pagination. I hope to help.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top