欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Elasticsearch7.5配置IK中文分词器+拼音分词

程序员文章站 2022-07-09 18:50:00
...

1. 安装插件

1.1 安装插件

拼音分词器:https://github.com/medcl/elasticsearch-analysis-pinyin
中文分词器:https://github.com/medcl/elasticsearch-analysis-ik

找到自己对应的自己的Elasticsearch版本的插件进行安装

  • Elasticsearch 7.5.1
  • elasticsearch-analysis-ik 7.5.1
  • elasticsearch-analysis-pinyin 7.5.1

直接进入Elasticsearch安装目录下,依次进行在线安装

./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.5.1/elasticsearch-analysis-ik-7.5.1.zip

./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v7.5.1/elasticsearch-analysis-pinyin-7.5.1.zip

安装完成后需要重启 elasticsearch,然后测试分词器是否OK,正常情况下会出现一堆分词结果

1.2 测试中文分词器

POST http://data:9200/_analyze
{
    "analyzer":"ik_smart",
    "text":"新型冠状病毒"
}

分词结果

{
    "tokens": [
        {
            "token": "新型",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "冠状病毒",
            "start_offset": 2,
            "end_offset": 6,
            "type": "CN_WORD",
            "position": 1
        }
    ]
}

1.3 测试拼音分词器

POST http://data:9200/_analyze
{
    "analyzer":"pinyin",
    "text":"新型冠状病毒"
}

分词结果

{
    "tokens": [
        {
            "token": "xin",
            "start_offset": 0,
            "end_offset": 0,
            "type": "word",
            "position": 0
        },
        {
            "token": "xxgzbd",
            "start_offset": 0,
            "end_offset": 0,
            "type": "word",
            "position": 0
        },
        {
            "token": "xing",
            "start_offset": 0,
            "end_offset": 0,
            "type": "word",
            "position": 1
        },
        {
            "token": "guan",
            "start_offset": 0,
            "end_offset": 0,
            "type": "word",
            "position": 2
        },
        {
            "token": "zhuang",
            "start_offset": 0,
            "end_offset": 0,
            "type": "word",
            "position": 3
        },
        {
            "token": "bing",
            "start_offset": 0,
            "end_offset": 0,
            "type": "word",
            "position": 4
        },
        {
            "token": "du",
            "start_offset": 0,
            "end_offset": 0,
            "type": "word",
            "position": 5
        }
    ]
}

2. 修改解析器

修改分词器,以下所有操作均是对song 索引库进行的操作

2.1 关闭索引

首先关闭索引,否则会报错的

POST http://data:9200/song/_close
{

}

2.2 配置IK+拼音分词

然后自定义分词器,我这里使用的IK_SMART+拼音

PUT  http://data:9200/song/_settings
{
    "index": {
        "analysis": {
            "analyzer": {
                "ik_pinyin_analyzer": {
                    "type": "custom",
                    "tokenizer": "ik_smart",
                    "filter": "pinyin_filter"
                }
            },
            "filter": {
                "pinyin_filter": {
                    "type": "pinyin",
                    "keep_first_letter": false
                }
            }
        }
    }
}

你也可以使用IK_MAX_WORD + 拼音分词

PUT  http://data:9200/song/_settings
{
    "index": {
        "analysis": {
            "analyzer": {
                "ik_pinyin_analyzer": {
                    "type": "custom",
                    "tokenizer": "ik_max_word",
                    "filter": "pinyin_filter"
                }
            },
            "filter": {
                "pinyin_filter": {
                    "type": "pinyin",
                    "keep_first_letter": false
                }
            }
        }
    }
}

2.3 开启索引

POST http://data:9200/song/_open
{

}