elasticsearch(es) 如何针对指定字段进行去重相关查询,完成如聚合、分组、分页、类似求和统计等操作?
获取所有的不同值
es 获取指定字段所有可能的值,可以使用桶聚合的 terms
聚合,如下示例:
GET {index}/_search
{
"size": 0,
"aggs": {
"distinct_aggs": {
"terms": {
"field": "status"
}
}
}
}
如上示例,获取指定索引的 status 字段的不同值,size 字段设置为 0,表示搜索出来的文档数为 0 个,也表示不关心文档内容只要聚合结果。 如果为 1 ,就会搜索出 1 个文档。返回如下:
{ "took": 2, "timed_out": false, "_shards": { "total": 3, "successful": 3, "skipped": 0, "failed": 0 }, "hits": { "total": 58439, "max_score": 0, "hits": [] }, "aggregations": { "distinct_aggs": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": 3, "doc_count": 46619 }, { "key": 2, "doc_count": 11810 }, { "key": 1, "doc_count": 10 } ] } } }
去重后分页
分页的话,肯定需要有排序规则,接着如上示例,增加的获取的条数参数 size
和 排序参数 order
即可:
GET {index}/_search
{
"size": 0,
"aggs": {
"distinct_aggs": {
"terms": {
"field": "item_id",
"size" : 1000,
"order": {
"_term": "asc"
}
}
}
}
}
输出如下:
{ "took": 1, "timed_out": false, "_shards": { "total": 3, "successful": 3, "skipped": 0, "failed": 0 }, "hits": { "total": 58463, "max_score": 0, "hits": [] }, "aggregations": { "distinct_aggs": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": 1, "doc_count": 32 }, { "key": 2, "doc_count": 11811 }, { "key": 3, "doc_count": 46620 }, ... ] } } }
聚合求和统计
聚合字段的排序,也可以通过指定字段的求和等计算统计结果后进行升降序排序,具体示例如下:
GET {index}/_search
{
"size": 0,
"aggs": {
"item_terms": {
"terms": {
"field": "item_id",
"size": 1000,
"order":[{
"gmv_stat": "desc"
},{
"gmv_180d": "desc"
}]
},
"aggs": {
"gmv_stat": {
"sum": {
"field": "gmv"
}
},
"gmv_180d": {
"sum": {
"script": "doc['gmv_90d'].value*2"
}
}
}
}
}
}
返回如下:
{ ... "aggregations": { "item_terms": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 260, "buckets": [ { "key": 23388, "doc_count": 18, "gmv_stat": { "value": 176220 }, "gmv_180d": { "value": 89732 } }, { "key": 96117, "doc_count": 16, "gmv_stat": { "value": 129306 }, "gmv_180d": { "value": 56988 } }, ... ] } } }