本章,我将通过引入一个实战案例,带领新手童鞋了解下Elasticsearch的基本使用,请先确保按照上一章的讲解搭建好了Elasticsearch开发环境。
一、案例背景
假设我们有一个电商网站,现在需要基于Elasticsearch构建一个商品管理后台系统。这个系统需要提供以下功能:
- 对商品信息进行CRUD(增删改查)操作;
- 执行简单的结构化查询;
- 执行简单的全文检索,以及复杂的phrase(短语)检索;
- 对于全文检索的结果,进行高亮显示;
- 对数据进行简单的聚合分析
我将基于Elasticsearch,带领新手童鞋实现上述功能,从而掌握ES的基本使用。
二、CRUD操作
我们先来进行商品的增删改查操作。
2.1 新增商品
新增商品数据,其实就是新增一条document,ES的语法如下:
#index表示索引名,type是类型名,id是document的唯一标识
PUT /{index}/{type}/{id}
{
#document数据
}
我们新建三条document:
PUT /ecommerce/product/1
{
"name" : "gaolujie yagao",
"desc" : "gaoxiao meibai",
"price" : 30,
"producer" : "gaolujie producer",
"tags": [ "meibai", "fangzhu" ]
}
PUT /ecommerce/product/2
{
"name" : "jiajieshi yagao",
"desc" : "youxiao fangzhu",
"price" : 25,
"producer" : "jiajieshi producer",
"tags": [ "fangzhu" ]
}
PUT /ecommerce/product/3
{
"name" : "zhonghua yagao",
"desc" : "caoben zhiwu",
"price" : 40,
"producer" : "zhonghua producer",
"tags": [ "qingxin" ]
}
Elasticsearch会自动建立Index和Type,同时,ES默认会对document的每个field都建立倒排索引,让其可以被搜索。
2.2 查询商品
查询商品数据,ES语法如下:
#index表示索引名,type是类型名,id是document的唯一标识
GET /{index}/{type}/{id}
比如,我们执行GET /ecommerce/product/1
,返回如下信息:
{
"_index" : "ecommerce",
"_type" : "product",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" : {
"name" : "gaolujie yagao",
"desc" : "gaoxiao meibai",
"price" : 30,
"producer" : "gaolujie producer",
"tags" : [
"meibai",
"fangzhu"
]
}
}
2.3 修改商品
修改商品数据,有两种方式: 替换 、 更新 。
替换方式 的ES语法如下:
#index表示索引名,type是类型名,id是document的唯一标识
PUT /{index}/{type}/{id}
{
#document的所有字段
}
比如,我们执行:
PUT /ecommerce/product/1
{
"name" : "jiaqiangban gaolujie yagao",
"desc" : "gaoxiao meibai",
"price" : 30,
"producer" : "gaolujie producer",
"tags": [ "meibai", "fangzhu" ]
}
返回如下信息:
{
"_index" : "ecommerce",
"_type" : "product",
"_id" : "1",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 2,
"_primary_term" : 1
}
注意,使用替换方式去修改商品信息时,document的所有field都需要带上。
我们再来看下更新方式。 更新方式 的语法如下:
#index表示索引名,type是类型名,id是document的唯一标识
POST /{index}/{type}/{id}/_update
{
"doc":{
#document的指定field
}
}
比如,我们执行:
POST /ecommerce/product/1/_update
{
"doc": {
"name": "jiaqiangban gaolujie yagao"
}
}
返回如下信息:
{
"_index" : "ecommerce",
"_type" : "product",
"_id" : "1",
"_version" : 4,
"result" : "noop",
"_shards" : {
"total" : 0,
"successful" : 0,
"failed" : 0
},
"_seq_no" : 4,
"_primary_term" : 1
}
2.4 删除商品
删除商品数据,ES语法如下:
#index表示索引名,type是类型名,id是document的唯一标识
DELETE /{index}/{type}/{id}
比如,我们执行DELETE /ecommerce/product/1
,返回如下信息:
{
"_index" : "ecommerce",
"_type" : "product",
"_id" : "1",
"_version" : 5,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 5,
"_primary_term" : 1
}
三、数据检索
本节,我们来讲解下Elasticsearch中的各种常见的数据检索方式。
3.1 query string search
语法: GET /{index}/{type}/_search
我们在很多网站进行搜索的时候,搜索词一般都是跟在search参数后面,以query string的http请求形式发起检索。比如,我们要搜索商品名中包含“yagao”关键子的所有商品,并将结果按照售价排序。那么就可以执行以下语句:
GET /ecommerce/product/_search?q=name:yagao&sort=price:desc
返回结果:
{
"took" : 530,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "ecommerce",
"_type" : "product",
"_id" : "3",
"_score" : null,
"_source" : {
"name" : "zhonghua yagao",
"desc" : "caoben zhiwu",
"price" : 40,
"producer" : "zhonghua producer",
"tags" : [
"qingxin"
]
},
"sort" : [
40
]
},
{
"_index" : "ecommerce",
"_type" : "product",
"_id" : "1",
"_score" : null,
"_source" : {
"name" : "gaolujie yagao",
"desc" : "gaoxiao meibai",
"price" : 30,
"producer" : "gaolujie producer",
"tags" : [
"meibai",
"fangzhu"
]
},
"sort" : [
30
]
},
{
"_index" : "ecommerce",
"_type" : "product",
"_id" : "2",
"_score" : null,
"_source" : {
"name" : "jiajieshi yagao",
"desc" : "youxiao fangzhu",
"price" : 25,
"producer" : "jiajieshi producer",
"tags" : [
"fangzhu"
]
},
"sort" : [
25
]
}
]
}
}
我们对上面的一些关键返回字段的含义说明下:
took: 耗费时长(毫秒)
timed_out: 是否超时
_shards :数据拆成的分片数,所以对于搜索请求,会打到所有的primary shard(或者是primary shard对应的某个replica shard);
hits.total :查询结果的数量,3个document;
hits.max_score :score的含义,就是document对于一个search的相关度的匹配分数,越相关,就越匹配,分数也高;
hits.hits :包含了匹配搜索的document的详细数据。
query string search适用于一些临时、快速的检索请求,如果查询请求很复杂,那么query string search是不太适用的,所以在生产环境中,几乎很少使用 query string search 。
3.2 query DSL
所谓DSL,就是 Domain Specified Language ,是Elasticsearch中很常用的一种检索方式。
语法:
GET /{index}/{type}/_search
{
"query":{}
}
我们通过“查询商品名中包含yagao关键子的所有商品,并将结果按照售价排序”这个示例来看下如何使用:
GET /ecommerce/product/_search
{
"query" : {
"match" : {
"name" : "yagao"
}
},
"sort": [
{ "price": "desc" }
]
}
可以看到,所有查询请求参数全部放到了一个http requstbody里面,所以可以构建复杂的查询,适合生产环境使用。
3.3 query filter
query filter主要用于对数据进行过滤。比如,我们要搜索商品名称包含”yagao“,且售价大于25元的商品:
GET /ecommerce/product/_search
{
"query" : {
"bool" : {
"must" : {
"match" : {
"name" : "yagao"
}
},
"filter" : {
"range" : {
"price" : { "gt" : 25 }
}
}
}
}
}
上述bool里面可以封装多个查询条件。
3.4 full-text search
full-text search,就是全文检索。比如我们要查询producer字段中包含”producer“或”jiajieshi“的所有商品:
GET /ecommerce/product/_search
{
"query" : {
"match" : {
"producer" : "producer jiajieshi"
}
}
}
可以看到,我们在上述的”match“里面有空格分隔开了"producer jiajieshi",这样只要producer字段中包含了上述任意一个字符,就都会被检索到。结果如下:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.1143606,
"hits" : [
{
"_index" : "ecommerce",
"_type" : "product",
"_id" : "2",
"_score" : 1.1143606,
"_source" : {
"name" : "jiajieshi yagao",
"desc" : "youxiao fangzhu",
"price" : 25,
"producer" : "jiajieshi producer",
"tags" : [
"fangzhu"
]
}
},
{
"_index" : "ecommerce",
"_type" : "product",
"_id" : "1",
"_score" : 0.13353139,
"_source" : {
"name" : "gaolujie yagao",
"desc" : "gaoxiao meibai",
"price" : 30,
"producer" : "gaolujie producer",
"tags" : [
"meibai",
"fangzhu"
]
}
},
{
"_index" : "ecommerce",
"_type" : "product",
"_id" : "3",
"_score" : 0.13353139,
"_source" : {
"name" : "zhonghua yagao",
"desc" : "caoben zhiwu",
"price" : 40,
"producer" : "zhonghua producer",
"tags" : [
"qingxin"
]
}
}
]
}
}
注意,采用全文检索时,返回的每一项数据中有一个“_score”字段,这个就是“相关度分数”的意思,分数越高,则越接近检索词。
3.5 phrase search
phrase search(短语搜索),跟全文检索刚好相反。在全文检索会将输入的搜索串拆解开来,去倒排索引里面去一一匹配,只要能匹配上任意一个拆解后的单词,就可以作为结果返回。而phrase search则要求输入的关键字必须在出现指定的文本中,一模一样才算匹配。
举个例子,我们要查询producer字段中包含关键字”jiajieshi producer“的所有商品:
GET /ecommerce/product/_search
{
"query" : {
"match_phrase" : {
"producer" : "jiajieshi producer"
}
}
}
返回结果如下,可以看到,只包含一条producer字段为“jiajieshi producer”的记录:
{
"took" : 48,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.1143606,
"hits" : [
{
"_index" : "ecommerce",
"_type" : "product",
"_id" : "2",
"_score" : 1.1143606,
"_source" : {
"name" : "jiajieshi yagao",
"desc" : "youxiao fangzhu",
"price" : 25,
"producer" : "jiajieshi producer",
"tags" : [
"fangzhu"
]
}
}
]
}
}
3.6 highlight search
highlight search(高亮搜索结果)。举个例子,我们希望检索name字段包含“yaogao”的所有商品,然后对检索结果中的“yaogao”文本进行高亮:
GET /ecommerce/product/_search
{
"query" : {
"match" : {
"name" : "yagao"
}
},
"highlight": {
"fields" : {
"name" : {}
}
}
}
返回结果如下,可以看到,匹配项的结果里多了个“highlight”字段,里面用<em>
高亮了匹配的文本:
{
"took" : 97,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.13353139,
"hits" : [
{
"_index" : "ecommerce",
"_type" : "product",
"_id" : "1",
"_score" : 0.13353139,
"_source" : {
"name" : "gaolujie yagao",
"desc" : "gaoxiao meibai",
"price" : 30,
"producer" : "gaolujie producer",
"tags" : [
"meibai",
"fangzhu"
]
},
"highlight" : {
"name" : [
"gaolujie <em>yagao</em>"
]
}
},
{
"_index" : "ecommerce",
"_type" : "product",
"_id" : "2",
"_score" : 0.13353139,
"_source" : {
"name" : "jiajieshi yagao",
"desc" : "youxiao fangzhu",
"price" : 25,
"producer" : "jiajieshi producer",
"tags" : [
"fangzhu"
]
},
"highlight" : {
"name" : [
"jiajieshi <em>yagao</em>"
]
}
},
{
"_index" : "ecommerce",
"_type" : "product",
"_id" : "3",
"_score" : 0.13353139,
"_source" : {
"name" : "zhonghua yagao",
"desc" : "caoben zhiwu",
"price" : 40,
"producer" : "zhonghua producer",
"tags" : [
"qingxin"
]
},
"highlight" : {
"name" : [
"zhonghua <em>yagao</em>"
]
}
}
]
}
}
四、数据分析
这一节,我们来看下如何利用Elasticsearch进行数据分析。首先,我们还是录入以下document数据:
PUT /ecommerce/product/1
{
"name" : "gaolujie yagao",
"desc" : "gaoxiao meibai",
"price" : 30,
"producer" : "gaolujie producer",
"tags": [ "meibai", "fangzhu" ]
}
PUT /ecommerce/product/2
{
"name" : "jiajieshi yagao",
"desc" : "youxiao fangzhu",
"price" : 25,
"producer" : "jiajieshi producer",
"tags": [ "fangzhu" ]
}
PUT /ecommerce/product/3
{
"name" : "zhonghua yagao",
"desc" : "caoben zhiwu",
"price" : 40,
"producer" : "zhonghua producer",
"tags": [ "qingxin" ]
}
注意,要使用聚合分析,需要将文本field的fielddata
属性设置为true,所以还要执行:
PUT /ecommerce/_mapping
{
"properties": {
"tags": {
"type": "text",
"fielddata": true
}
}
}
关于mapping,我后续章节会详细讲解,这里读者只要跟着这么操作就行了。
4.1 聚合分析
语法:
GET /{index}/{type}/_search
{
"aggs": {
"聚合名称": {
"terms": { "field": "分组字段" }
}
}
}
我们的第一个需求是:计算每个tag下的商品数量。比如,"fangzhu"这个tag,一共有两个商品包含它。
请求如下,意思是新建一个名称为“group_by_tags”的聚合,按照tags这个字段进行分组:
GET /ecommerce/product/_search
{
"aggs": {
"group_by_tags": {
"terms": { "field": "tags" }
}
}
}
响应如下:
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "ecommerce",
"_type" : "product",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "gaolujie yagao",
"desc" : "gaoxiao meibai",
"price" : 30,
"producer" : "gaolujie producer",
"tags" : [
"meibai",
"fangzhu"
]
}
},
{
"_index" : "ecommerce",
"_type" : "product",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "jiajieshi yagao",
"desc" : "youxiao fangzhu",
"price" : 25,
"producer" : "jiajieshi producer",
"tags" : [
"fangzhu"
]
}
},
{
"_index" : "ecommerce",
"_type" : "product",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"name" : "zhonghua yagao",
"desc" : "caoben zhiwu",
"price" : 40,
"producer" : "zhonghua producer",
"tags" : [
"qingxin"
]
}
}
]
},
"aggregations" : {
"group_by_tags" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "fangzhu",
"doc_count" : 2
},
{
"key" : "meibai",
"doc_count" : 1
},
{
"key" : "qingxin",
"doc_count" : 1
}
]
}
}
}
我们关键看:
"aggregations" : {
"group_by_tags" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "fangzhu",
"doc_count" : 2
},
{
"key" : "meibai",
"doc_count" : 1
},
{
"key" : "qingxin",
"doc_count" : 1
}
]
}
}
上面的“buckets”就是按照tags进行分组的结果,key和doc_count就表示每个tag对应的数量。
我们再来个复杂点的需求:按照指定的价格范围区间进行分组,然后在每组内再按照tag进行分组,最后再计算组内的平均价格。
请求:
GET /ecommerce/product/_search
{
"size": 0,
"aggs": {
"group_by_price": {
"range": {
"field": "price",
"ranges": [
{
"from": 0,
"to": 20
},
{
"from": 20,
"to": 40
},
{
"from": 40,
"to": 50
}
]
},
"aggs": {
"group_by_tags": {
"terms": {
"field": "tags"
},
"aggs": {
"average_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
}
}
响应:
{
"took" : 11,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"group_by_price" : {
"buckets" : [
{
"key" : "0.0-20.0",
"from" : 0.0,
"to" : 20.0,
"doc_count" : 0,
"group_by_tags" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
},
{
"key" : "20.0-40.0",
"from" : 20.0,
"to" : 40.0,
"doc_count" : 2,
"group_by_tags" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "fangzhu",
"doc_count" : 2,
"average_price" : {
"value" : 27.5
}
},
{
"key" : "meibai",
"doc_count" : 1,
"average_price" : {
"value" : 30.0
}
}
]
}
},
{
"key" : "40.0-50.0",
"from" : 40.0,
"to" : 50.0,
"doc_count" : 1,
"group_by_tags" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "qingxin",
"doc_count" : 1,
"average_price" : {
"value" : 40.0
}
}
]
}
}
]
}
}
}
五、总结
本章,我通过一个简单的案例讲解了Elasticsearch的数据增删改查,基本的数据检索和数据分析,相信新手童鞋们对Elasticsearch的基本操作已经掌握了。下一章,我将讲解Elasticsearch的分布式架构。
Java 面试宝典是大明哥全力打造的 Java 精品面试题,它是一份靠谱、强大、详细、经典的 Java 后端面试宝典。它不仅仅只是一道道面试题,而是一套完整的 Java 知识体系,一套你 Java 知识点的扫盲贴。
它的内容包括:
- 大厂真题:Java 面试宝典里面的题目都是最近几年的高频的大厂面试真题。
- 原创内容:Java 面试宝典内容全部都是大明哥原创,内容全面且通俗易懂,回答部分可以直接作为面试回答内容。
- 持续更新:一次购买,永久有效。大明哥会持续更新 3+ 年,累计更新 1000+,宝典会不断迭代更新,保证最新、最全面。
- 覆盖全面:本宝典累计更新 1000+,从 Java 入门到 Java 架构的高频面试题,实现 360° 全覆盖。
- 不止面试:内容包含面试题解析、内容详解、知识扩展,它不仅仅只是一份面试题,更是一套完整的 Java 知识体系。
- 宝典详情:https://www.yuque.com/chenssy/sike-java/xvlo920axlp7sf4k
- 宝典总览:https://www.yuque.com/chenssy/sike-java/yogsehzntzgp4ly1
- 宝典进展:https://www.yuque.com/chenssy/sike-java/en9ned7loo47z5aw
目前 Java 面试宝典累计更新 400+ 道,总字数 42w+。大明哥还在持续更新中,下图是大明哥在 2024-12 月份的更新情况:
想了解详情的小伙伴,扫描下面二维码加大明哥微信【daming091】咨询
同时,大明哥也整理一套目前市面最常见的热点面试题。微信搜[大明哥聊 Java]或扫描下方二维码关注大明哥的原创公众号[大明哥聊 Java] ,回复【面试题】 即可免费领取。