首页 > 资讯 > 云计算 > 正文

Elasticsearch5.4安装中文插件的问题解决

17-12-26 来源：[db:作者]

收藏我要投稿

1:下载编译好的安装包。注意下载版本要对应。

2:下载好了之后解压，将解压后的文件夹放在elasticsearch目录下的plugins目录下，并重命名为analysis-ik

3:将analysis-ik下config目录整个拷贝到elasticsearch目录下的config目录下，并重命名为ik

4: 重启elasticsearch

官网上的第一种方式总是报错所以只能搞这种方式了

二、分词器的使用

1、ik带有两个分词器:

ik_max_word ：会将文本做最细粒度的拆分；尽可能多的拆分出词语 ik_smart：会做最粗粒度的拆分；已被分出的词语将不会再次被其它词语占有看下边的例子就会明白他们的区别了： ik_smart: 在终端输入以下语句：curl -XGET 'http://192.168.198.223:9200/_analyze?pretty&analyzer=ik_smart' -d '五星红旗迎风飘扬'返回如下内容:

{
  "tokens" : [
    {
      "token" : "五星红旗",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "迎风",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "飘扬",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "CN_WORD",
      "position" : 2
    }
  ]
}

ik_max_word:

在终端输入以下内容：

curl -XGET 'http://192.168.198.223:9200/_analyze?pretty&analyzer=ik_max_word' -d '五星红旗迎风飘扬'

返回如下内容:

{
  "tokens" : [
    {
      "token" : "五星红旗",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "五星",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "五",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "TYPE_CNUM",
      "position" : 2
    },
    {
      "token" : "星",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 3
    },
    {
      "token" : "红旗",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "迎风",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "飘扬",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
      "token" : "飘",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 7
    },
    {
      "token" : "扬",
      "start_offset" : 7,
      "end_offset" : 8,
      "type" : "CN_WORD",
      "position" : 8
    }
  ]
}

个人理解：ik_max_word相对来说分词效果更细微了，所以效率就差了些

点击复制链接与好友分享!回本站首页