We're interested here in working with high-cardinality indexes. (Which are known to be a problem for Elastic Search)

We already know from you that for

select count(distinct high_cardinality_field) from my_table

you already have some optimizations to count it. Will it be possible someday to write something like:

select count_via_hyperloglog(high_cardinality_field) from my_table

having count_via_hyperloglog as a UDF or something, as it is possible right now in ES via ES-plugins?

有帮助吗?

解决方案

in crate this feature is on our backlog as an additional aggregation function which uses the hyperlog algorithm. we plan to do the naming derived from presto http://prestodb.io/docs/current/functions/aggregate.html. Your example will then probably look like:

select approx_distinct(high_cardinality_field) from my_table

However, a possible performance improvement for one specific field per table is to cluster your table based on the high cardinality field as described under https://crate.io/docs/current/sql/ddl.html#routing

其他提示

High cardinality counting with HyperLogLog is planned for 1.1.0, the documentation is already up: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html

Example:

{
    "aggs" : {
        "author_count" : {
            "cardinality" : {
                "field" : "author"
            }
        }
    }
}

As for something like UDF, you can use scripts, .e.g. by combining a filter aggregation with a script filter

{
    "aggs": {
        "in_stock_products": {
            "filter": {
                "script": {
                    "script": "doc['price'].value > minPrice"
                    "params": {
                        "minPrice": 5
                    }
                }
            },
            "aggs": {
                "avg_price": {
                    "avg": {
                        "field": "price"
                    }
                }
            }
        }
    }
}
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top