大家好,我是考100分的小小码 ,祝大家学习进步,加薪顺利呀。今天说一说Elasticsearch常用查询过滤接口与值得注意的问题「建议收藏」,希望您对编程的造诣更进一步.
Elasticsearch常用查询过滤接口与值得注意的问题
简介
本文将介绍一些ES查询过滤的接口和一些值得问题。
在ES中主要是查询,并且只有在bool查询中才有过滤上下文,当然聚合函数中也可能出现过滤上下文。
过滤不计算相关性评分,并且能够缓存,所以应该优先考虑过滤。
具体的做法就是使用bool查询的filter,后面会详细介绍。
关于查询过滤的rest api介绍,可以参考Elasticsearch查询过滤解惑
数据准备bulk
首先,我们使用bulk添加一些测试数据:
public class BulkTest {
private static final String[] homes = {"河北省", "山西省", "辽宁省", "吉林省", "江苏省", "浙江省", "安徽省", "福建省", "江西省", "山东省", "河南省", "湖北省", "湖南省", "广东省", "海南省", "四川省", "贵州省", "云南省", "陕西省", "甘肃省", "青海省", "黑龙江省", "台湾省", "北京市", "天津市", "上海市", "重庆市", "广西壮族自治区", "西藏自治区", "宁夏回族自治区", "新疆维吾尔自治区", "内蒙古自治区", "香港特别行政区", "澳门特别行政区"};
private RestHighLevelClient client;
@Before
public void setUp() {
HttpHost host = new HttpHost("localhost", 9200, "http");
client = new RestHighLevelClient(RestClient.builder(host));
}
@Test
public void add() throws IOException {
BulkRequest request = new BulkRequest();
IndexRequest indexRequest;
List<UserInfo> userInfos = UserInfo.getUserInfo(10000);
String indexName = "user";
for(UserInfo userInfo : userInfos){
indexRequest = new IndexRequest(indexName).id(userInfo.id).source(JSON.toJSONString(userInfo), XContentType.JSON);
request.add(indexRequest);
}
client.bulk(request, RequestOptions.DEFAULT);
}
private static class UserInfo{
private String id;
private Long createTime;
private String createTimeStr;
private short status;
private String home;
private String option;
public static List<UserInfo> getUserInfo(int size){
LinkedList<UserInfo> userInfos = new LinkedList<>();
LocalDateTime now = LocalDateTime.now();
Random random = new Random();
int count = 1;
for(int i=0;i<size;i++){
UserInfo userInfo = new UserInfo();
LocalDateTime localDateTime = now.plusDays(random.nextInt(1000));
userInfo.setId(String.format("%s%05d",localDateTime.format(DatetimeUtil.YYYYMMDDHHMMSS_FORMATTER),count++));
userInfo.setCreateTimeStr(localDateTime.format(DatetimeUtil.DATE_TIME_FORMATTER));
userInfo.setCreateTime(DatetimeUtil.getLocalDateTimeMill(localDateTime));
userInfo.setHome(homes[random.nextInt(homes.length)]);
userInfo.setOption(homes[random.nextInt(homes.length)]);
userInfo.setStatus((short) random.nextInt(10));
userInfos.add(userInfo);
}
return userInfos;
}
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public Long getCreateTime() {
return createTime;
}
public void setCreateTime(Long createTime) {
this.createTime = createTime;
}
public String getCreateTimeStr() {
return createTimeStr;
}
public void setCreateTimeStr(String createTimeStr) {
this.createTimeStr = createTimeStr;
}
public short getStatus() {
return status;
}
public void setStatus(short status) {
this.status = status;
}
public String getHome() {
return home;
}
public void setHome(String home) {
this.home = home;
}
public String getOption() {
return option;
}
public void setOption(String option) {
this.option = option;
}
}
}
bool查询
Java rest API中有一个QueryBuilders工厂类,可以创建各个查询bulider。
bool查询中最重要的是filter,表示过滤。
当然,也可以使用常见的must,表示必须满足, should,表示至少一个,must_not表示必须不。
下面直接上代码:
@Test
public void boolQueryBuilder() throws IOException {
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
boolQueryBuilder.filter(QueryBuilders.termQuery("status", "5"));//bool查询中过滤上下文
searchSourceBuilder.query(boolQueryBuilder);
searchSourceBuilder.from(0);//从0开始
searchSourceBuilder.size(20);//默认10
searchSourceBuilder.sort("createTime", SortOrder.DESC);//排序
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
searchRequest.source(searchSourceBuilder);
SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHits hits = search.getHits();
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsString());
}
}
上面代码就是通过过滤的方式找status字段为5的文档,并且给了获取开始条数和获取多少条。
默认从0开始,获取10条,不要深度分页,需要深度分页参考后面的scroll和search after
当然也可以通过sort指定排序字段。
term与terms查询
term查询,表示精确匹配,terms和term基本一样,但是terms允许设置多个值,只要有一个值精确匹配就算匹配成功。
可以直接使用term查询,但是还是建议尽量放将term查询放在bool查询的filter中。
@Test
public void term() throws IOException {
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("home.keyword", "四川省");
// TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("home", "四川省");
searchSourceBuilder.query(termQueryBuilder);
System.out.println(searchSourceBuilder.toString());
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
searchRequest.source(searchSourceBuilder);
SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHits hits = search.getHits();
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsString());
}
}
@Test
public void terms() throws IOException {
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
TermsQueryBuilder termsQueryBuilder = QueryBuilders.termsQuery("status", 5,6);
boolQueryBuilder.filter(termsQueryBuilder);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(boolQueryBuilder);
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
searchRequest.source(searchSourceBuilder);
SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHits hits = search.getHits();
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsString());
}
}
注意: 如果使用明明有数据,但是使用term查询不出来,这个时候,也许你可以检查一下mapping对应的字段了。
很多朋友不喜欢设置mapping,或者设置了动态mapping,这样动态添加字符串类型的时候,ES就会自动生成一个text类型,并且设置fields,取前256字符设置为keyword。
这个时候,就要使用field-name.keyword这个字段来查询,而不是field-name字段。很多时候查询时间字符串不准确基本也是这个原因。
范围查询
范围查询非常简单,也非常常用:
@Test
public void range() throws IOException {
RangeQueryBuilder rangeQueryBuilder = QueryBuilders.rangeQuery("status").gte(4).lte(7);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(rangeQueryBuilder);
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
searchRequest.source(searchSourceBuilder);
SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHits hits = search.getHits();
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsString());
}
}
ids
通过id集合查询文档:
@Test
public void ids() throws IOException {
IdsQueryBuilder idsQueryBuilder = QueryBuilders.idsQuery().addIds("20210313173331","20211101173331");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(idsQueryBuilder);
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
searchRequest.source(searchSourceBuilder);
SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHits hits = search.getHits();
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsString());
}
}
exists
查询存在指定字段的文档:
@Test
public void exists() throws IOException {
//检查字段是否存在
// ExistsQueryBuilder existsQueryBuilder = QueryBuilders.existsQuery("status");
ExistsQueryBuilder existsQueryBuilder = QueryBuilders.existsQuery("hello");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(existsQueryBuilder);
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
searchRequest.source(searchSourceBuilder);
SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHits hits = search.getHits();
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsString());
}
}
match
@Test
public void match() throws IOException {
MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("home", "四川");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(matchQueryBuilder);
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
searchRequest.source(searchSourceBuilder);
SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHits hits = search.getHits();
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsString());
}
}
multi_match
multi_match和match差不多,但是可以指定多个字段搜索。
@Test
public void multiMatch() throws IOException {
MultiMatchQueryBuilder multiMatchQueryBuilder = QueryBuilders.multiMatchQuery("四川", "home", "option");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(multiMatchQueryBuilder);
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
searchRequest.source(searchSourceBuilder);
SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHits hits = search.getHits();
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsString());
}
}
scroll
有时候,在做统计的时候,可能需要搜索全部数据,如果数据量非常大,需要深度分页,简单的查询可能就不行了,这个时候就需要scroll。
scroll之所以有效,是因为它不做全局排序,这样在query阶段这个节点只需要查询自己的数据集,返回满足条件的id集合就可以了。
scroll会维护这个id集合的上下文一段时间,这样就可以查询全量数据。
import org.apache.http.HttpHost;
import org.elasticsearch.action.search.ClearScrollRequest;
import org.elasticsearch.action.search.ClearScrollResponse;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.SearchScrollRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.RangeQueryBuilder;
import org.elasticsearch.search.Scroll;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.FetchSourceContext;
import org.junit.Before;
import org.junit.Test;
import java.io.BufferedOutputStream;
import java.io.FileOutputStream;
import java.io.IOException;
public class ScrollTest {
private RestHighLevelClient client;
@Before
public void setUp() {
HttpHost host = new HttpHost("127.0.0.1", 9200, "http");
client = new RestHighLevelClient(RestClient.builder(host));
}
@Test
public void scroll() throws IOException {
long start = System.currentTimeMillis();
FileOutputStream fileOutputStream = new FileOutputStream("F:\tmp\long_scroll9.txt");
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L));
SearchRequest searchRequest = new SearchRequest("user");
searchRequest.scroll(scroll);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
//只获取指定字段
String[] fields = {"id","createTime"};
FetchSourceContext sourceContext = new FetchSourceContext(true,fields,null);
searchSourceBuilder.fetchSource(sourceContext);
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
// RangeQueryBuilder lastlogintime = QueryBuilders.rangeQuery("createTimeStr");
// lastlogintime.gte("2019-08-11 00:00:00");
RangeQueryBuilder lastlogintime = QueryBuilders.rangeQuery("createTime");
lastlogintime.gte(1597298833674L);
boolQueryBuilder.filter(lastlogintime);
// boolQueryBuilder.must(lastlogintime);
searchSourceBuilder.query(boolQueryBuilder);
// searchSourceBuilder.sort("_doc");
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
String scrollId = searchResponse.getScrollId();
int count = 0;
SearchHit[] searchHits = searchResponse.getHits().getHits();
count+=searchHits.length;
// print(searchHits);
print(bufferedOutputStream,searchHits);
while (searchHits != null && searchHits.length > 0) {
SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
scrollRequest.scroll(scroll);
searchResponse = client.scroll(scrollRequest, RequestOptions.DEFAULT);
scrollId = searchResponse.getScrollId();
searchHits = searchResponse.getHits().getHits();
count+=searchHits.length;
// print(searchHits);
print(bufferedOutputStream,searchHits);
}
bufferedOutputStream.close();
System.out.println(count);
ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
clearScrollRequest.addScrollId(scrollId);
ClearScrollResponse clearScrollResponse = client.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);
boolean succeeded = clearScrollResponse.isSucceeded();
System.out.println(succeeded);
System.out.println(System.currentTimeMillis() - start);
}
private static void print(BufferedOutputStream bufferedOutputStream, SearchHit[] searchHits) throws IOException {
for (SearchHit hit : searchHits) {
bufferedOutputStream.write(hit.getSourceAsString().getBytes());
bufferedOutputStream.write("
".getBytes());
}
}
private static void print(SearchHit[] searchHits){
for(SearchHit searchHit : searchHits){
System.out.println(searchHit.getSourceAsString());
}
}
}
search after
scroll也有自己的局限,例如在query阶段满足条件的ids特别多,整个过程就会变得非常慢。
这个时候就可以考虑使用search after,search after和scroll原理基本一样,不过search after是实时的。
import org.apache.http.HttpHost;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.RangeQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.junit.Before;
import org.junit.Test;
import java.io.BufferedOutputStream;
import java.io.FileOutputStream;
import java.io.IOException;
public class SearchAfterTest {
private static RestHighLevelClient client;
@Before
public void setUp() {
HttpHost host = new HttpHost("127.0.0.1", 9200, "http");
client = new RestHighLevelClient(RestClient.builder(host));
}
@Test
public void search() throws IOException {
long start = System.currentTimeMillis();
Object[] objects = null;
FileOutputStream fileOutputStream = new FileOutputStream("F:\tmp\safter_long_filter6.txt");
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
boolean type = true;
while (type) {
SearchHit[] hits = searchAfter(client, objects);
if(hits.length == 0){
break;
}
objects = hits[hits.length-1].getSortValues();
if (hits.length < 1000) {
type = false;
}
writeData(bufferedOutputStream,hits);
}
bufferedOutputStream.close();
System.out.println(System.currentTimeMillis() - start);
client.close();
}
private static void writeData(BufferedOutputStream bufferedOutputStream, SearchHit[] searchHits) throws IOException {
for (SearchHit hit : searchHits) {
bufferedOutputStream.write(hit.getSourceAsString().getBytes());
bufferedOutputStream.write("
".getBytes());
}
}
public static SearchHit[] searchAfter(RestHighLevelClient client, Object[] objects) throws IOException {
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
// RangeQueryBuilder lastlogintime = QueryBuilders.rangeQuery("createTimeStr");
RangeQueryBuilder lastlogintime = QueryBuilders.rangeQuery("createTimeStr.keyword");
lastlogintime.gte("2020-08-11 00:00:00");
// RangeQueryBuilder lastlogintime = QueryBuilders.rangeQuery("createTime");
// lastlogintime.gte(1597298833674L);
// lastlogintime.gte(1597298833674L).lte(1597298833674L);
boolQueryBuilder.filter(lastlogintime);
sourceBuilder.query(boolQueryBuilder);
sourceBuilder.size(1000);
// sourceBuilder.sort("_id", SortOrder.DESC);
if(objects != null) {
sourceBuilder.searchAfter(objects);
}
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("user");
searchRequest.source(sourceBuilder);
SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHit[] hits = response.getHits().getHits();
return hits;
}
}
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。
转载请注明出处: https://daima100.com/6750.html