elasticsearch(es) 如何针对指定字段进行去重相关查询,完成如聚合、分组、分页、类似求和统计等操作?
获取所有的不同值
es 获取指定字段所有可能的值,可以使用桶聚合的 terms 聚合,如下示例:
GET {index}/_search
{
"size": 0,
"aggs": {
"distinct_aggs": {
"terms": {
"field": "status"
}
}
}
}
如上示例,获取指定索引的 status 字段的不同值,size 字段设置为 0,表示搜索出来的文档数为 0 个,也表示不关心文档内容只要聚合结果。 如果为 1 ,就会搜索出 1 个文档。返回如下:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 58439,
"max_score": 0,
"hits": []
},
"aggregations": {
"distinct_aggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 3,
"doc_count": 46619
},
{
"key": 2,
"doc_count": 11810
},
{
"key": 1,
"doc_count": 10
}
]
}
}
}
去重后分页
分页的话,肯定需要有排序规则,接着如上示例,增加的获取的条数参数 size 和 排序参数 order 即可:
GET {index}/_search
{
"size": 0,
"aggs": {
"distinct_aggs": {
"terms": {
"field": "item_id",
"size" : 1000,
"order": {
"_term": "asc"
}
}
}
}
}
输出如下:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 58463,
"max_score": 0,
"hits": []
},
"aggregations": {
"distinct_aggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 32
},
{
"key": 2,
"doc_count": 11811
},
{
"key": 3,
"doc_count": 46620
},
...
]
}
}
}
聚合求和统计
聚合字段的排序,也可以通过指定字段的求和等计算统计结果后进行升降序排序,具体示例如下:
GET {index}/_search
{
"size": 0,
"aggs": {
"item_terms": {
"terms": {
"field": "item_id",
"size": 1000,
"order":[{
"gmv_stat": "desc"
},{
"gmv_180d": "desc"
}]
},
"aggs": {
"gmv_stat": {
"sum": {
"field": "gmv"
}
},
"gmv_180d": {
"sum": {
"script": "doc['gmv_90d'].value*2"
}
}
}
}
}
}
返回如下:
{
...
"aggregations": {
"item_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 260,
"buckets": [
{
"key": 23388,
"doc_count": 18,
"gmv_stat": {
"value": 176220
},
"gmv_180d": {
"value": 89732
}
},
{
"key": 96117,
"doc_count": 16,
"gmv_stat": {
"value": 129306
},
"gmv_180d": {
"value": 56988
}
},
...
]
}
}
}