Elasticsearch常用查询过滤接口与值得注意的问题「建议收藏」

Elasticsearch常用查询过滤接口与值得注意的问题「建议收藏」Elasticsearch常用查询过滤接口与值得注意的问题 简介 本文将介绍一些ES查询过滤的接口和一些值得问题。 在ES中主要是查询,并且只有在bool查询中才有过滤上下文,当然聚合函数中也可能出…

Elasticsearch常用查询过滤接口与值得注意的问题

Elasticsearch常用查询过滤接口与值得注意的问题

简介

本文将介绍一些ES查询过滤的接口和一些值得问题。

在ES中主要是查询,并且只有在bool查询中才有过滤上下文,当然聚合函数中也可能出现过滤上下文。

过滤不计算相关性评分,并且能够缓存,所以应该优先考虑过滤。

具体的做法就是使用bool查询的filter,后面会详细介绍。

关于查询过滤的rest api介绍,可以参考Elasticsearch查询过滤解惑

数据准备bulk

首先,我们使用bulk添加一些测试数据:

public class BulkTest {

    private static final String[] homes = {"河北省", "山西省", "辽宁省", "吉林省", "江苏省", "浙江省", "安徽省", "福建省", "江西省", "山东省", "河南省", "湖北省", "湖南省", "广东省", "海南省", "四川省", "贵州省", "云南省", "陕西省", "甘肃省", "青海省", "黑龙江省", "台湾省", "北京市", "天津市", "上海市", "重庆市", "广西壮族自治区", "西藏自治区", "宁夏回族自治区", "新疆维吾尔自治区", "内蒙古自治区", "香港特别行政区", "澳门特别行政区"};

    private RestHighLevelClient client;

    @Before
    public void setUp() {
        HttpHost host = new HttpHost("localhost", 9200, "http");
        client = new RestHighLevelClient(RestClient.builder(host));
    }

    @Test
    public void add() throws IOException {
        BulkRequest request = new BulkRequest();
        IndexRequest indexRequest;
        List<UserInfo> userInfos = UserInfo.getUserInfo(10000);
        String indexName = "user";
        for(UserInfo userInfo : userInfos){
            indexRequest = new IndexRequest(indexName).id(userInfo.id).source(JSON.toJSONString(userInfo), XContentType.JSON);
            request.add(indexRequest);
        }
        client.bulk(request, RequestOptions.DEFAULT);

    }

    private static class UserInfo{
        private String id;
        private Long createTime;
        private String createTimeStr;
        private short status;
        private String home;
        private String option;

        public static List<UserInfo> getUserInfo(int size){
            LinkedList<UserInfo> userInfos = new LinkedList<>();
            LocalDateTime now = LocalDateTime.now();
            Random random = new Random();
            int count = 1;
            for(int i=0;i<size;i++){
                UserInfo userInfo = new UserInfo();
                LocalDateTime localDateTime = now.plusDays(random.nextInt(1000));
                userInfo.setId(String.format("%s%05d",localDateTime.format(DatetimeUtil.YYYYMMDDHHMMSS_FORMATTER),count++));
                userInfo.setCreateTimeStr(localDateTime.format(DatetimeUtil.DATE_TIME_FORMATTER));
                userInfo.setCreateTime(DatetimeUtil.getLocalDateTimeMill(localDateTime));
                userInfo.setHome(homes[random.nextInt(homes.length)]);
                userInfo.setOption(homes[random.nextInt(homes.length)]);
                userInfo.setStatus((short) random.nextInt(10));
                userInfos.add(userInfo);
            }
            return userInfos;
        }

        public String getId() {
            return id;
        }

        public void setId(String id) {
            this.id = id;
        }

        public Long getCreateTime() {
            return createTime;
        }

        public void setCreateTime(Long createTime) {
            this.createTime = createTime;
        }

        public String getCreateTimeStr() {
            return createTimeStr;
        }

        public void setCreateTimeStr(String createTimeStr) {
            this.createTimeStr = createTimeStr;
        }

        public short getStatus() {
            return status;
        }

        public void setStatus(short status) {
            this.status = status;
        }

        public String getHome() {
            return home;
        }

        public void setHome(String home) {
            this.home = home;
        }

        public String getOption() {
            return option;
        }

        public void setOption(String option) {
            this.option = option;
        }
    }
}

bool查询

Java rest API中有一个QueryBuilders工厂类,可以创建各个查询bulider。

bool查询中最重要的是filter,表示过滤。

当然,也可以使用常见的must,表示必须满足, should,表示至少一个,must_not表示必须不。

下面直接上代码:

@Test
public void boolQueryBuilder() throws IOException {
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

    BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
    boolQueryBuilder.filter(QueryBuilders.termQuery("status", "5"));//bool查询中过滤上下文

    searchSourceBuilder.query(boolQueryBuilder);
    searchSourceBuilder.from(0);//从0开始
    searchSourceBuilder.size(20);//默认10
    searchSourceBuilder.sort("createTime", SortOrder.DESC);//排序

    SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
    searchRequest.source(searchSourceBuilder);
    SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
    SearchHits hits = search.getHits();
    for (SearchHit hit : hits) {
        System.out.println(hit.getSourceAsString());
    }
}

上面代码就是通过过滤的方式找status字段为5的文档,并且给了获取开始条数和获取多少条。

默认从0开始,获取10条,不要深度分页,需要深度分页参考后面的scroll和search after

当然也可以通过sort指定排序字段。

term与terms查询

term查询,表示精确匹配,terms和term基本一样,但是terms允许设置多个值,只要有一个值精确匹配就算匹配成功。

可以直接使用term查询,但是还是建议尽量放将term查询放在bool查询的filter中。

@Test
public void term() throws IOException {
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

    TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("home.keyword", "四川省");
//        TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("home", "四川省");
    searchSourceBuilder.query(termQueryBuilder);
    System.out.println(searchSourceBuilder.toString());

    SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
    searchRequest.source(searchSourceBuilder);

    SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
    SearchHits hits = search.getHits();
    for (SearchHit hit : hits) {
        System.out.println(hit.getSourceAsString());
    }
}

@Test
public void terms() throws IOException {
    BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
    TermsQueryBuilder termsQueryBuilder = QueryBuilders.termsQuery("status", 5,6);
    boolQueryBuilder.filter(termsQueryBuilder);
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(boolQueryBuilder);

    SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
    searchRequest.source(searchSourceBuilder);

    SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
    SearchHits hits = search.getHits();
    for (SearchHit hit : hits) {
        System.out.println(hit.getSourceAsString());
    }
}

注意: 如果使用明明有数据,但是使用term查询不出来,这个时候,也许你可以检查一下mapping对应的字段了。

很多朋友不喜欢设置mapping,或者设置了动态mapping,这样动态添加字符串类型的时候,ES就会自动生成一个text类型,并且设置fields,取前256字符设置为keyword。

这个时候,就要使用field-name.keyword这个字段来查询,而不是field-name字段。很多时候查询时间字符串不准确基本也是这个原因。

范围查询

范围查询非常简单,也非常常用:

@Test
public void range() throws IOException {
    RangeQueryBuilder rangeQueryBuilder = QueryBuilders.rangeQuery("status").gte(4).lte(7);

    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(rangeQueryBuilder);

    SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
    searchRequest.source(searchSourceBuilder);
    SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
    SearchHits hits = search.getHits();
    for (SearchHit hit : hits) {
        System.out.println(hit.getSourceAsString());
    }
}

ids

通过id集合查询文档:

 @Test
public void ids() throws IOException {
    IdsQueryBuilder idsQueryBuilder = QueryBuilders.idsQuery().addIds("20210313173331","20211101173331");

    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(idsQueryBuilder);
    SearchRequest searchRequest = new SearchRequest(INDEX_NAME);

    searchRequest.source(searchSourceBuilder);
    SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
    SearchHits hits = search.getHits();
    for (SearchHit hit : hits) {
        System.out.println(hit.getSourceAsString());
    }
}

exists

查询存在指定字段的文档:

 @Test
public void exists() throws IOException {
    //检查字段是否存在
//        ExistsQueryBuilder existsQueryBuilder = QueryBuilders.existsQuery("status");
    ExistsQueryBuilder existsQueryBuilder = QueryBuilders.existsQuery("hello");
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(existsQueryBuilder);

    SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
    searchRequest.source(searchSourceBuilder);
    SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
    SearchHits hits = search.getHits();
    for (SearchHit hit : hits) {
        System.out.println(hit.getSourceAsString());
    }
}

match

@Test
public void match() throws IOException {
    MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("home", "四川");

    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(matchQueryBuilder);

    SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
    searchRequest.source(searchSourceBuilder);
    SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
    SearchHits hits = search.getHits();
    for (SearchHit hit : hits) {
        System.out.println(hit.getSourceAsString());
    }
}

multi_match

multi_match和match差不多,但是可以指定多个字段搜索。

@Test
public void multiMatch() throws IOException {
    MultiMatchQueryBuilder multiMatchQueryBuilder = QueryBuilders.multiMatchQuery("四川", "home", "option");

    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(multiMatchQueryBuilder);

    SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
    searchRequest.source(searchSourceBuilder);
    SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
    SearchHits hits = search.getHits();
    for (SearchHit hit : hits) {
        System.out.println(hit.getSourceAsString());
    }
}

scroll

有时候,在做统计的时候,可能需要搜索全部数据,如果数据量非常大,需要深度分页,简单的查询可能就不行了,这个时候就需要scroll。

scroll之所以有效,是因为它不做全局排序,这样在query阶段这个节点只需要查询自己的数据集,返回满足条件的id集合就可以了。

scroll会维护这个id集合的上下文一段时间,这样就可以查询全量数据。

import org.apache.http.HttpHost;
import org.elasticsearch.action.search.ClearScrollRequest;
import org.elasticsearch.action.search.ClearScrollResponse;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.SearchScrollRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.RangeQueryBuilder;
import org.elasticsearch.search.Scroll;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.FetchSourceContext;
import org.junit.Before;
import org.junit.Test;

import java.io.BufferedOutputStream;
import java.io.FileOutputStream;
import java.io.IOException;

public class ScrollTest {

    private RestHighLevelClient client;

    @Before
    public void setUp() {
        HttpHost host = new HttpHost("127.0.0.1", 9200, "http");
        client = new RestHighLevelClient(RestClient.builder(host));
    }

    @Test
    public void scroll() throws IOException {
        long start = System.currentTimeMillis();
        FileOutputStream fileOutputStream = new FileOutputStream("F:\tmp\long_scroll9.txt");
        BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
        final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L));
        SearchRequest searchRequest = new SearchRequest("user");
        searchRequest.scroll(scroll);
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        //只获取指定字段
        String[] fields = {"id","createTime"};
        FetchSourceContext sourceContext = new FetchSourceContext(true,fields,null);
        searchSourceBuilder.fetchSource(sourceContext);

        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
//        RangeQueryBuilder lastlogintime = QueryBuilders.rangeQuery("createTimeStr");
//        lastlogintime.gte("2019-08-11 00:00:00");

        RangeQueryBuilder lastlogintime = QueryBuilders.rangeQuery("createTime");
        lastlogintime.gte(1597298833674L);

        boolQueryBuilder.filter(lastlogintime);
//        boolQueryBuilder.must(lastlogintime);
        searchSourceBuilder.query(boolQueryBuilder);
//        searchSourceBuilder.sort("_doc");

        searchRequest.source(searchSourceBuilder);


        SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
        String scrollId = searchResponse.getScrollId();
        int count = 0;
        SearchHit[] searchHits = searchResponse.getHits().getHits();
        count+=searchHits.length;
//        print(searchHits);
        print(bufferedOutputStream,searchHits);

        while (searchHits != null && searchHits.length > 0) {
            SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
            scrollRequest.scroll(scroll);
            searchResponse = client.scroll(scrollRequest, RequestOptions.DEFAULT);
            scrollId = searchResponse.getScrollId();
            searchHits = searchResponse.getHits().getHits();
            count+=searchHits.length;
//            print(searchHits);
            print(bufferedOutputStream,searchHits);
        }
        bufferedOutputStream.close();
        System.out.println(count);
        ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
        clearScrollRequest.addScrollId(scrollId);
        ClearScrollResponse clearScrollResponse = client.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);
        boolean succeeded = clearScrollResponse.isSucceeded();
        System.out.println(succeeded);
        System.out.println(System.currentTimeMillis() - start);
    }

    private static void print(BufferedOutputStream bufferedOutputStream, SearchHit[] searchHits) throws IOException {
        for (SearchHit hit : searchHits) {
            bufferedOutputStream.write(hit.getSourceAsString().getBytes());
            bufferedOutputStream.write("
".getBytes());
        }
    }

    private static void print(SearchHit[] searchHits){
        for(SearchHit searchHit : searchHits){
            System.out.println(searchHit.getSourceAsString());
        }
    }
}

search after

scroll也有自己的局限,例如在query阶段满足条件的ids特别多,整个过程就会变得非常慢。

这个时候就可以考虑使用search after,search after和scroll原理基本一样,不过search after是实时的。

import org.apache.http.HttpHost;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.RangeQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.junit.Before;
import org.junit.Test;

import java.io.BufferedOutputStream;
import java.io.FileOutputStream;
import java.io.IOException;

public class SearchAfterTest {

    private static RestHighLevelClient client;

    @Before
    public void setUp() {
        HttpHost host = new HttpHost("127.0.0.1", 9200, "http");
        client = new RestHighLevelClient(RestClient.builder(host));
    }

    @Test
    public void search() throws IOException {
        long start = System.currentTimeMillis();
        Object[] objects = null;
        FileOutputStream fileOutputStream = new FileOutputStream("F:\tmp\safter_long_filter6.txt");
        BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
        boolean type = true;
        while (type) {
            SearchHit[] hits = searchAfter(client, objects);
            if(hits.length == 0){
                break;
            }
            objects = hits[hits.length-1].getSortValues();
            if (hits.length < 1000) {
                type = false;
            }
            writeData(bufferedOutputStream,hits);
        }
        bufferedOutputStream.close();
        System.out.println(System.currentTimeMillis() - start);
        client.close();
    }

    private static void writeData(BufferedOutputStream bufferedOutputStream, SearchHit[] searchHits) throws IOException {
        for (SearchHit hit : searchHits) {
            bufferedOutputStream.write(hit.getSourceAsString().getBytes());
            bufferedOutputStream.write("
".getBytes());
        }
    }

    public static SearchHit[] searchAfter(RestHighLevelClient client, Object[] objects) throws IOException {
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();

        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();

//        RangeQueryBuilder lastlogintime = QueryBuilders.rangeQuery("createTimeStr");
        RangeQueryBuilder lastlogintime = QueryBuilders.rangeQuery("createTimeStr.keyword");
        lastlogintime.gte("2020-08-11 00:00:00");
//        RangeQueryBuilder lastlogintime = QueryBuilders.rangeQuery("createTime");
//        lastlogintime.gte(1597298833674L);
//        lastlogintime.gte(1597298833674L).lte(1597298833674L);
        boolQueryBuilder.filter(lastlogintime);

        sourceBuilder.query(boolQueryBuilder);
        sourceBuilder.size(1000);
//        sourceBuilder.sort("_id", SortOrder.DESC);
        if(objects != null) {
            sourceBuilder.searchAfter(objects);
        }
        SearchRequest searchRequest = new SearchRequest();
        searchRequest.indices("user");
        searchRequest.source(sourceBuilder);
        SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
        SearchHit[] hits = response.getHits().getHits();
        return hits;
    }
}

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。
转载请注明出处: https://daima100.com/6750.html

(0)
上一篇 2023-04-04
下一篇 2023-04-04

相关推荐

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注