2023-08-08  阅读(3)
原文作者:Ressmix 原文地址:https://www.tpvlog.com/article/146

本章,我将通过引入一个实战案例,带领新手童鞋了解下Elasticsearch的基本使用,请先确保按照上一章的讲解搭建好了Elasticsearch开发环境。

一、案例背景

假设我们有一个电商网站,现在需要基于Elasticsearch构建一个商品管理后台系统。这个系统需要提供以下功能:

  1. 对商品信息进行CRUD(增删改查)操作;
  2. 执行简单的结构化查询;
  3. 执行简单的全文检索,以及复杂的phrase(短语)检索;
  4. 对于全文检索的结果,进行高亮显示;
  5. 对数据进行简单的聚合分析

我将基于Elasticsearch,带领新手童鞋实现上述功能,从而掌握ES的基本使用。

二、CRUD操作

我们先来进行商品的增删改查操作。

2.1 新增商品

新增商品数据,其实就是新增一条document,ES的语法如下:

    #index表示索引名,type是类型名,id是document的唯一标识
    PUT /{index}/{type}/{id}
    {
        #document数据
    }

我们新建三条document:

    PUT /ecommerce/product/1
    {
        "name" : "gaolujie yagao",
        "desc" :  "gaoxiao meibai",
        "price" :  30,
        "producer" :      "gaolujie producer",
        "tags": [ "meibai", "fangzhu" ]
    }
    
    PUT /ecommerce/product/2
    {
        "name" : "jiajieshi yagao",
        "desc" :  "youxiao fangzhu",
        "price" :  25,
        "producer" :      "jiajieshi producer",
        "tags": [ "fangzhu" ]
    }
    
    PUT /ecommerce/product/3
    {
        "name" : "zhonghua yagao",
        "desc" :  "caoben zhiwu",
        "price" :  40,
        "producer" :      "zhonghua producer",
        "tags": [ "qingxin" ]
    }

Elasticsearch会自动建立Index和Type,同时,ES默认会对document的每个field都建立倒排索引,让其可以被搜索。

2.2 查询商品

查询商品数据,ES语法如下:

    #index表示索引名,type是类型名,id是document的唯一标识
    GET /{index}/{type}/{id}

比如,我们执行GET /ecommerce/product/1,返回如下信息:

    {
      "_index" : "ecommerce",
      "_type" : "product",
      "_id" : "1",
      "_version" : 1,
      "_seq_no" : 0,
      "_primary_term" : 1,
      "found" : true,
      "_source" : {
        "name" : "gaolujie yagao",
        "desc" : "gaoxiao meibai",
        "price" : 30,
        "producer" : "gaolujie producer",
        "tags" : [
          "meibai",
          "fangzhu"
        ]
      }
    }

2.3 修改商品

修改商品数据,有两种方式: 替换更新

替换方式 的ES语法如下:

    #index表示索引名,type是类型名,id是document的唯一标识
    PUT /{index}/{type}/{id}
    {
        #document的所有字段
    }

比如,我们执行:

    PUT /ecommerce/product/1
    {
        "name" : "jiaqiangban gaolujie yagao",
        "desc" :  "gaoxiao meibai",
        "price" :  30,
        "producer" :      "gaolujie producer",
        "tags": [ "meibai", "fangzhu" ]
    }

返回如下信息:

    {
      "_index" : "ecommerce",
      "_type" : "product",
      "_id" : "1",
      "_version" : 2,
      "result" : "updated",
      "_shards" : {
        "total" : 2,
        "successful" : 1,
        "failed" : 0
      },
      "_seq_no" : 2,
      "_primary_term" : 1
    }

注意,使用替换方式去修改商品信息时,document的所有field都需要带上。

我们再来看下更新方式。 更新方式 的语法如下:

    #index表示索引名,type是类型名,id是document的唯一标识
    POST /{index}/{type}/{id}/_update
    {
        "doc":{
            #document的指定field
        }
    }

比如,我们执行:

    POST /ecommerce/product/1/_update
    {
      "doc": {
        "name": "jiaqiangban gaolujie yagao"
      }
    }

返回如下信息:

    {
      "_index" : "ecommerce",
      "_type" : "product",
      "_id" : "1",
      "_version" : 4,
      "result" : "noop",
      "_shards" : {
        "total" : 0,
        "successful" : 0,
        "failed" : 0
      },
      "_seq_no" : 4,
      "_primary_term" : 1
    }

2.4 删除商品

删除商品数据,ES语法如下:

    #index表示索引名,type是类型名,id是document的唯一标识
    DELETE /{index}/{type}/{id}

比如,我们执行DELETE /ecommerce/product/1,返回如下信息:

    {
      "_index" : "ecommerce",
      "_type" : "product",
      "_id" : "1",
      "_version" : 5,
      "result" : "deleted",
      "_shards" : {
        "total" : 2,
        "successful" : 1,
        "failed" : 0
      },
      "_seq_no" : 5,
      "_primary_term" : 1
    }

三、数据检索

本节,我们来讲解下Elasticsearch中的各种常见的数据检索方式。

3.1 query string search

语法: GET /{index}/{type}/_search

我们在很多网站进行搜索的时候,搜索词一般都是跟在search参数后面,以query string的http请求形式发起检索。比如,我们要搜索商品名中包含“yagao”关键子的所有商品,并将结果按照售价排序。那么就可以执行以下语句:

    GET /ecommerce/product/_search?q=name:yagao&sort=price:desc

返回结果:

    {
      "took" : 530,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 3,
          "relation" : "eq"
        },
        "max_score" : null,
        "hits" : [
          {
            "_index" : "ecommerce",
            "_type" : "product",
            "_id" : "3",
            "_score" : null,
            "_source" : {
              "name" : "zhonghua yagao",
              "desc" : "caoben zhiwu",
              "price" : 40,
              "producer" : "zhonghua producer",
              "tags" : [
                "qingxin"
              ]
            },
            "sort" : [
              40
            ]
          },
          {
            "_index" : "ecommerce",
            "_type" : "product",
            "_id" : "1",
            "_score" : null,
            "_source" : {
              "name" : "gaolujie yagao",
              "desc" : "gaoxiao meibai",
              "price" : 30,
              "producer" : "gaolujie producer",
              "tags" : [
                "meibai",
                "fangzhu"
              ]
            },
            "sort" : [
              30
            ]
          },
          {
            "_index" : "ecommerce",
            "_type" : "product",
            "_id" : "2",
            "_score" : null,
            "_source" : {
              "name" : "jiajieshi yagao",
              "desc" : "youxiao fangzhu",
              "price" : 25,
              "producer" : "jiajieshi producer",
              "tags" : [
                "fangzhu"
              ]
            },
            "sort" : [
              25
            ]
          }
        ]
      }
    }

我们对上面的一些关键返回字段的含义说明下:
took: 耗费时长(毫秒)
timed_out: 是否超时
_shards :数据拆成的分片数,所以对于搜索请求,会打到所有的primary shard(或者是primary shard对应的某个replica shard);
hits.total :查询结果的数量,3个document;
hits.max_score :score的含义,就是document对于一个search的相关度的匹配分数,越相关,就越匹配,分数也高;
hits.hits :包含了匹配搜索的document的详细数据。

query string search适用于一些临时、快速的检索请求,如果查询请求很复杂,那么query string search是不太适用的,所以在生产环境中,几乎很少使用 query string search

3.2 query DSL

所谓DSL,就是 Domain Specified Language ,是Elasticsearch中很常用的一种检索方式。

语法:

    GET /{index}/{type}/_search
    {
        "query":{}
    }

我们通过“查询商品名中包含yagao关键子的所有商品,并将结果按照售价排序”这个示例来看下如何使用:

    GET /ecommerce/product/_search
    {
        "query" : {
            "match" : {
                "name" : "yagao"
            }
        },
        "sort": [
            { "price": "desc" }
        ]
    }

可以看到,所有查询请求参数全部放到了一个http requstbody里面,所以可以构建复杂的查询,适合生产环境使用。

3.3 query filter

query filter主要用于对数据进行过滤。比如,我们要搜索商品名称包含”yagao“,且售价大于25元的商品:

    GET /ecommerce/product/_search
    {
        "query" : {
            "bool" : {
                "must" : {
                    "match" : {
                        "name" : "yagao" 
                    }
                },
                "filter" : {
                    "range" : {
                        "price" : { "gt" : 25 } 
                    }
                }
            }
        }
    }

上述bool里面可以封装多个查询条件。

3.4 full-text search

full-text search,就是全文检索。比如我们要查询producer字段中包含”producer“或”jiajieshi“的所有商品:

    GET /ecommerce/product/_search
    {
        "query" : {
            "match" : {
                "producer" : "producer jiajieshi"
            }
        }
    }

可以看到,我们在上述的”match“里面有空格分隔开了"producer jiajieshi",这样只要producer字段中包含了上述任意一个字符,就都会被检索到。结果如下:

    {
      "took" : 3,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 3,
          "relation" : "eq"
        },
        "max_score" : 1.1143606,
        "hits" : [
          {
            "_index" : "ecommerce",
            "_type" : "product",
            "_id" : "2",
            "_score" : 1.1143606,
            "_source" : {
              "name" : "jiajieshi yagao",
              "desc" : "youxiao fangzhu",
              "price" : 25,
              "producer" : "jiajieshi producer",
              "tags" : [
                "fangzhu"
              ]
            }
          },
          {
            "_index" : "ecommerce",
            "_type" : "product",
            "_id" : "1",
            "_score" : 0.13353139,
            "_source" : {
              "name" : "gaolujie yagao",
              "desc" : "gaoxiao meibai",
              "price" : 30,
              "producer" : "gaolujie producer",
              "tags" : [
                "meibai",
                "fangzhu"
              ]
            }
          },
          {
            "_index" : "ecommerce",
            "_type" : "product",
            "_id" : "3",
            "_score" : 0.13353139,
            "_source" : {
              "name" : "zhonghua yagao",
              "desc" : "caoben zhiwu",
              "price" : 40,
              "producer" : "zhonghua producer",
              "tags" : [
                "qingxin"
              ]
            }
          }
        ]
      }
    }

注意,采用全文检索时,返回的每一项数据中有一个“_score”字段,这个就是“相关度分数”的意思,分数越高,则越接近检索词。

3.5 phrase search

phrase search(短语搜索),跟全文检索刚好相反。在全文检索会将输入的搜索串拆解开来,去倒排索引里面去一一匹配,只要能匹配上任意一个拆解后的单词,就可以作为结果返回。而phrase search则要求输入的关键字必须在出现指定的文本中,一模一样才算匹配。

举个例子,我们要查询producer字段中包含关键字”jiajieshi producer“的所有商品:

    GET /ecommerce/product/_search
    {
        "query" : {
            "match_phrase" : {
                "producer" : "jiajieshi producer"
            }
        }
    }

返回结果如下,可以看到,只包含一条producer字段为“jiajieshi producer”的记录:

    {
      "took" : 48,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 1,
          "relation" : "eq"
        },
        "max_score" : 1.1143606,
        "hits" : [
          {
            "_index" : "ecommerce",
            "_type" : "product",
            "_id" : "2",
            "_score" : 1.1143606,
            "_source" : {
              "name" : "jiajieshi yagao",
              "desc" : "youxiao fangzhu",
              "price" : 25,
              "producer" : "jiajieshi producer",
              "tags" : [
                "fangzhu"
              ]
            }
          }
        ]
      }
    }

3.6 highlight search

highlight search(高亮搜索结果)。举个例子,我们希望检索name字段包含“yaogao”的所有商品,然后对检索结果中的“yaogao”文本进行高亮:

    GET /ecommerce/product/_search
    {
        "query" : {
            "match" : {
                "name" : "yagao"
            }
        },
        "highlight": {
            "fields" : {
                "name" : {}
            }
        }
    }

返回结果如下,可以看到,匹配项的结果里多了个“highlight”字段,里面用<em>高亮了匹配的文本:

    {
      "took" : 97,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 3,
          "relation" : "eq"
        },
        "max_score" : 0.13353139,
        "hits" : [
          {
            "_index" : "ecommerce",
            "_type" : "product",
            "_id" : "1",
            "_score" : 0.13353139,
            "_source" : {
              "name" : "gaolujie yagao",
              "desc" : "gaoxiao meibai",
              "price" : 30,
              "producer" : "gaolujie producer",
              "tags" : [
                "meibai",
                "fangzhu"
              ]
            },
            "highlight" : {
              "name" : [
                "gaolujie <em>yagao</em>"
              ]
            }
          },
          {
            "_index" : "ecommerce",
            "_type" : "product",
            "_id" : "2",
            "_score" : 0.13353139,
            "_source" : {
              "name" : "jiajieshi yagao",
              "desc" : "youxiao fangzhu",
              "price" : 25,
              "producer" : "jiajieshi producer",
              "tags" : [
                "fangzhu"
              ]
            },
            "highlight" : {
              "name" : [
                "jiajieshi <em>yagao</em>"
              ]
            }
          },
          {
            "_index" : "ecommerce",
            "_type" : "product",
            "_id" : "3",
            "_score" : 0.13353139,
            "_source" : {
              "name" : "zhonghua yagao",
              "desc" : "caoben zhiwu",
              "price" : 40,
              "producer" : "zhonghua producer",
              "tags" : [
                "qingxin"
              ]
            },
            "highlight" : {
              "name" : [
                "zhonghua <em>yagao</em>"
              ]
            }
          }
        ]
      }
    }

四、数据分析

这一节,我们来看下如何利用Elasticsearch进行数据分析。首先,我们还是录入以下document数据:

    PUT /ecommerce/product/1
    {
        "name" : "gaolujie yagao",
        "desc" :  "gaoxiao meibai",
        "price" :  30,
        "producer" :      "gaolujie producer",
        "tags": [ "meibai", "fangzhu" ]
    }
    
    PUT /ecommerce/product/2
    {
        "name" : "jiajieshi yagao",
        "desc" :  "youxiao fangzhu",
        "price" :  25,
        "producer" :      "jiajieshi producer",
        "tags": [ "fangzhu" ]
    }
    
    PUT /ecommerce/product/3
    {
        "name" : "zhonghua yagao",
        "desc" :  "caoben zhiwu",
        "price" :  40,
        "producer" :      "zhonghua producer",
        "tags": [ "qingxin" ]
    }

注意,要使用聚合分析,需要将文本field的fielddata属性设置为true,所以还要执行:

    PUT /ecommerce/_mapping
    {
      "properties": {
        "tags": {
          "type": "text",
          "fielddata": true
        }
      }
    }

关于mapping,我后续章节会详细讲解,这里读者只要跟着这么操作就行了。

4.1 聚合分析

语法:

    GET /{index}/{type}/_search
    {
      "aggs": {
        "聚合名称": {
          "terms": { "field": "分组字段" }
        }
      }
    }

我们的第一个需求是:计算每个tag下的商品数量。比如,"fangzhu"这个tag,一共有两个商品包含它。

请求如下,意思是新建一个名称为“group_by_tags”的聚合,按照tags这个字段进行分组:

    GET /ecommerce/product/_search
    {
      "aggs": {
        "group_by_tags": {
          "terms": { "field": "tags" }
        }
      }
    }

响应如下:

    {
      "took" : 6,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 3,
          "relation" : "eq"
        },
        "max_score" : 1.0,
        "hits" : [
          {
            "_index" : "ecommerce",
            "_type" : "product",
            "_id" : "1",
            "_score" : 1.0,
            "_source" : {
              "name" : "gaolujie yagao",
              "desc" : "gaoxiao meibai",
              "price" : 30,
              "producer" : "gaolujie producer",
              "tags" : [
                "meibai",
                "fangzhu"
              ]
            }
          },
          {
            "_index" : "ecommerce",
            "_type" : "product",
            "_id" : "2",
            "_score" : 1.0,
            "_source" : {
              "name" : "jiajieshi yagao",
              "desc" : "youxiao fangzhu",
              "price" : 25,
              "producer" : "jiajieshi producer",
              "tags" : [
                "fangzhu"
              ]
            }
          },
          {
            "_index" : "ecommerce",
            "_type" : "product",
            "_id" : "3",
            "_score" : 1.0,
            "_source" : {
              "name" : "zhonghua yagao",
              "desc" : "caoben zhiwu",
              "price" : 40,
              "producer" : "zhonghua producer",
              "tags" : [
                "qingxin"
              ]
            }
          }
        ]
      },
      "aggregations" : {
        "group_by_tags" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [
            {
              "key" : "fangzhu",
              "doc_count" : 2
            },
            {
              "key" : "meibai",
              "doc_count" : 1
            },
            {
              "key" : "qingxin",
              "doc_count" : 1
            }
          ]
        }
      }
    }

我们关键看:

    "aggregations" : {
      "group_by_tags" : {
        "doc_count_error_upper_bound" : 0,
        "sum_other_doc_count" : 0,
        "buckets" : [
          {
            "key" : "fangzhu",
            "doc_count" : 2
          },
          {
            "key" : "meibai",
            "doc_count" : 1
          },
          {
            "key" : "qingxin",
            "doc_count" : 1
          }
        ]
      }
    }

上面的“buckets”就是按照tags进行分组的结果,key和doc_count就表示每个tag对应的数量。


我们再来个复杂点的需求:按照指定的价格范围区间进行分组,然后在每组内再按照tag进行分组,最后再计算组内的平均价格。

请求:

    GET /ecommerce/product/_search
    {
      "size": 0,
      "aggs": {
        "group_by_price": {
          "range": {
            "field": "price",
            "ranges": [
              {
                "from": 0,
                "to": 20
              },
              {
                "from": 20,
                "to": 40
              },
              {
                "from": 40,
                "to": 50
              }
            ]
          },
          "aggs": {
            "group_by_tags": {
              "terms": {
                "field": "tags"
              },
              "aggs": {
                "average_price": {
                  "avg": {
                    "field": "price"
                  }
                }
              }
            }
          }
        }
      }
    }

响应:

    {
      "took" : 11,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 3,
          "relation" : "eq"
        },
        "max_score" : null,
        "hits" : [ ]
      },
      "aggregations" : {
        "group_by_price" : {
          "buckets" : [
            {
              "key" : "0.0-20.0",
              "from" : 0.0,
              "to" : 20.0,
              "doc_count" : 0,
              "group_by_tags" : {
                "doc_count_error_upper_bound" : 0,
                "sum_other_doc_count" : 0,
                "buckets" : [ ]
              }
            },
            {
              "key" : "20.0-40.0",
              "from" : 20.0,
              "to" : 40.0,
              "doc_count" : 2,
              "group_by_tags" : {
                "doc_count_error_upper_bound" : 0,
                "sum_other_doc_count" : 0,
                "buckets" : [
                  {
                    "key" : "fangzhu",
                    "doc_count" : 2,
                    "average_price" : {
                      "value" : 27.5
                    }
                  },
                  {
                    "key" : "meibai",
                    "doc_count" : 1,
                    "average_price" : {
                      "value" : 30.0
                    }
                  }
                ]
              }
            },
            {
              "key" : "40.0-50.0",
              "from" : 40.0,
              "to" : 50.0,
              "doc_count" : 1,
              "group_by_tags" : {
                "doc_count_error_upper_bound" : 0,
                "sum_other_doc_count" : 0,
                "buckets" : [
                  {
                    "key" : "qingxin",
                    "doc_count" : 1,
                    "average_price" : {
                      "value" : 40.0
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }

五、总结

本章,我通过一个简单的案例讲解了Elasticsearch的数据增删改查,基本的数据检索和数据分析,相信新手童鞋们对Elasticsearch的基本操作已经掌握了。下一章,我将讲解Elasticsearch的分布式架构。


Java 面试宝典是大明哥全力打造的 Java 精品面试题,它是一份靠谱、强大、详细、经典的 Java 后端面试宝典。它不仅仅只是一道道面试题,而是一套完整的 Java 知识体系,一套你 Java 知识点的扫盲贴。

它的内容包括:

  • 大厂真题:Java 面试宝典里面的题目都是最近几年的高频的大厂面试真题。
  • 原创内容:Java 面试宝典内容全部都是大明哥原创,内容全面且通俗易懂,回答部分可以直接作为面试回答内容。
  • 持续更新:一次购买,永久有效。大明哥会持续更新 3+ 年,累计更新 1000+,宝典会不断迭代更新,保证最新、最全面。
  • 覆盖全面:本宝典累计更新 1000+,从 Java 入门到 Java 架构的高频面试题,实现 360° 全覆盖。
  • 不止面试:内容包含面试题解析、内容详解、知识扩展,它不仅仅只是一份面试题,更是一套完整的 Java 知识体系。
  • 宝典详情:https://www.yuque.com/chenssy/sike-java/xvlo920axlp7sf4k
  • 宝典总览:https://www.yuque.com/chenssy/sike-java/yogsehzntzgp4ly1
  • 宝典进展:https://www.yuque.com/chenssy/sike-java/en9ned7loo47z5aw

目前 Java 面试宝典累计更新 400+ 道,总字数 42w+。大明哥还在持续更新中,下图是大明哥在 2024-12 月份的更新情况:

想了解详情的小伙伴,扫描下面二维码加大明哥微信【daming091】咨询

同时,大明哥也整理一套目前市面最常见的热点面试题。微信搜[大明哥聊 Java]或扫描下方二维码关注大明哥的原创公众号[大明哥聊 Java] ,回复【面试题】 即可免费领取。

阅读全文