문제

We're interested here in working with high-cardinality indexes. (Which are known to be a problem for Elastic Search)

We already know from you that for

select count(distinct high_cardinality_field) from my_table

you already have some optimizations to count it. Will it be possible someday to write something like:

select count_via_hyperloglog(high_cardinality_field) from my_table

having count_via_hyperloglog as a UDF or something, as it is possible right now in ES via ES-plugins?

도움이 되었습니까?

해결책

in crate this feature is on our backlog as an additional aggregation function which uses the hyperlog algorithm. we plan to do the naming derived from presto http://prestodb.io/docs/current/functions/aggregate.html. Your example will then probably look like:

select approx_distinct(high_cardinality_field) from my_table

However, a possible performance improvement for one specific field per table is to cluster your table based on the high cardinality field as described under https://crate.io/docs/current/sql/ddl.html#routing

다른 팁

High cardinality counting with HyperLogLog is planned for 1.1.0, the documentation is already up: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html

Example:

{
    "aggs" : {
        "author_count" : {
            "cardinality" : {
                "field" : "author"
            }
        }
    }
}

As for something like UDF, you can use scripts, .e.g. by combining a filter aggregation with a script filter

{
    "aggs": {
        "in_stock_products": {
            "filter": {
                "script": {
                    "script": "doc['price'].value > minPrice"
                    "params": {
                        "minPrice": 5
                    }
                }
            },
            "aggs": {
                "avg_price": {
                    "avg": {
                        "field": "price"
                    }
                }
            }
        }
    }
}
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top