elasticsearch学习笔记(三十五)——Elasticsearch 索引管理

栏目: 后端 · 发布时间: 4年前

内容简介:创建索引示例:删除索引API也可以通过使用逗号分隔列表应用于多个索引,或者通过使用_all或*作为索引应用于所有索引(小心!)。要禁用允许通过通配符删除索引,或者将配置中的_all设置action.destructive_requires_name设置为true。也可以通过群集更新设置api更改此设置。

索引的基本操作

创建索引

PUT /{index}
{
  "settings": {},
  "mappings": {
    "properties": {
    }
  }
}

创建索引示例:

PUT /test_index
{
  "settings": {
    "number_of_replicas": 1,
    "number_of_shards": 5
  },
  "mappings": {
    "properties": {
      "field1": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "ctime": {
        "type": "date"
      }
    }
  }
}

修改索引

PUT /{index}/_settings
{
    "setttings": {}
}

PUT /test_index/_settings
{
  "settings": {
    "number_of_replicas": 2
  }
}

删除索引

DELETE /{index}

删除索引API也可以通过使用逗号分隔列表应用于多个索引,或者通过使用_all或*作为索引应用于所有索引(小心!)。

要禁用允许通过通配符删除索引,或者将配置中的_all设置action.destructive_requires_name设置为true。也可以通过群集更新设置api更改此设置。

修改分词器以及定义自己的分词器

Elasticsearch附带了各种内置分析器,无需进一步配置即可在任何索引中使用:

standard analyzer: 
所述standard分析器将文本分为在字边界条件,由Unicode的文本分割算法所定义的。它删除了大多数标点符号,小写术语,并支持删除停用词。
Simple analyzer:
该simple分析仪将文本分为方面每当遇到一个字符是不是字母。然后全部变为小写
whitespace analyzer: 
whitespace只要遇到任何空格字符 ,分析器就会将文本划分为术语。它不会进行小写转换。
stop analyzer: 
该stop分析器是像simple,而且还支持去除停止词。
keyword analyzer: 
所述keyword分析器是一个“空操作”分析器接受任何文本它被赋予并输出完全相同的文本作为一个单一的术语,也就是不会分词,进行精确匹配。
pattern analyzer: 
所述pattern分析器使用一个正则表达式对文本进行拆分。它支持小写转换和停用字。
language analyzer: 
Elasticsearch提供了许多特定于语言的分析器,如english或 french。
fingerprint analyzer: 
所述fingerprint分析器是一种专业的指纹分析器,它可以创建一个指纹,用于重复检测。

修改分词器的设置

启动english停用词token filter

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "es_std": {
          "type": "standard",
          "stopwords": "_english_"
        }
      }
    }
  }
}
GET /my_index/_analyze
{
  "analyzer": "standard",
  "text": "a dog is in the house"
}
{
  "tokens" : [
    {
      "token" : "a",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "dog",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "is",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "in",
      "start_offset" : 9,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "the",
      "start_offset" : 12,
      "end_offset" : 15,
      "type" : "<ALPHANUM>",
      "position" : 4
    },
    {
      "token" : "house",
      "start_offset" : 16,
      "end_offset" : 21,
      "type" : "<ALPHANUM>",
      "position" : 5
    }
  ]
}
GET /my_index/_analyze
{
  "analyzer": "es_std",
  "text": "a dog is in the house"
}
{
  "tokens" : [
    {
      "token" : "dog",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "house",
      "start_offset" : 16,
      "end_offset" : 21,
      "type" : "<ALPHANUM>",
      "position" : 5
    }
  ]
}

定制自己的分词器

PUT /test_index
{
  "settings": {
    "analysis": {
      "char_filter": {
        "&_to_and": {
          "type": "mapping",
          "mappings": ["&=>and"]
        }
      },
      "filter": {
        "my_stopwords": {
          "type": "stop",
          "stopwords": ["the", "a"]
        }
      },
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "char_filter": ["html_strip", "&_to_and"],
          "tokenizer": "standard",
          "filter": ["lowercase", "my_stopwords"]
        }
      }
    }
  }
}
GET /test_index/_analyze
{
  "text": "tom&jerry are a friend in the house, <a>, HAHA!!",
  "analyzer": "my_analyzer"
}
{
  "tokens" : [
    {
      "token" : "tomandjerry",
      "start_offset" : 0,
      "end_offset" : 9,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "are",
      "start_offset" : 10,
      "end_offset" : 13,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "friend",
      "start_offset" : 16,
      "end_offset" : 22,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "in",
      "start_offset" : 23,
      "end_offset" : 25,
      "type" : "<ALPHANUM>",
      "position" : 4
    },
    {
      "token" : "house",
      "start_offset" : 30,
      "end_offset" : 35,
      "type" : "<ALPHANUM>",
      "position" : 6
    },
    {
      "token" : "haha",
      "start_offset" : 42,
      "end_offset" : 46,
      "type" : "<ALPHANUM>",
      "position" : 7
    }
  ]
}

定制化自己的dynamic mapping策略

dynamic参数

true: 遇到陌生字段就进行dynamic mapping

false: 遇到陌生字段就忽略

strict: 遇到陌生字段,就报错

举例:

PUT /test_index
{
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "title": {
        "type": "text"
      },
      "address": {
        "type": "object",
        "dynamic": "true"
      }
    }
  }
}
PUT /test_index/_doc/1
{
  "title": "my article",
  "content": "this is my article",
  "address": {
    "province": "guangdong",
    "city": "guangzhou"
  }
}
{
  "error": {
    "root_cause": [
      {
        "type": "strict_dynamic_mapping_exception",
        "reason": "mapping set to strict, dynamic introduction of [content] within [_doc] is not allowed"
      }
    ],
    "type": "strict_dynamic_mapping_exception",
    "reason": "mapping set to strict, dynamic introduction of [content] within [_doc] is not allowed"
  },
  "status": 400
}
PUT /test_index/_doc/1
{
  "title": "my article",
  "address": {
    "province": "guangdong",
    "city": "guangzhou"
  }
}
{
  "_index" : "test_index",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

date_detection

elasticsearch默认会按照一定格式识别date,比如yyyy-MM-dd。但是如果某个field先过来一个2017-01-01的值,就会被自动dynamic mapping成date,后面如果再来一个"hello world"之类的值,就会报错。此时的解决方案是可以手动关闭某个type的date_detention,如果有需要,自己手动指定某个field为date类型。

PUT /{index}
{
  "mappings": {
    "date_detection": false
  }
}

dynamic template

"dynamic_templates": [
    {
        "my_template_name": {
            ... match conditions ...
            "mapping": {...}
        }
    }
]

示例:

PUT /test_index
{
  "mappings": {
    "dynamic_templates": [
      {
        "en": {
          "match": "*_en",
          "match_mapping_type": "string",
          "mapping": {
            "type": "text",
            "analyzer": "english"
          }
        }
      }  
    ]
  }
}
PUT /test_index/_doc/1
{
  "title": "this is my first article"
}
PUT /test_index/_doc/2
{
  "title_en": "this is my first article"
}
GET /test_index/_mapping
{
  "test_index" : {
    "mappings" : {
      "dynamic_templates" : [
        {
          "en" : {
            "match" : "*_en",
            "match_mapping_type" : "string",
            "mapping" : {
              "analyzer" : "english",
              "type" : "text"
            }
          }
        }
      ],
      "properties" : {
        "title" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "title_en" : {
          "type" : "text",
          "analyzer" : "english"
        }
      }
    }
  }
}
GET /test_index/_search?q=is
{
  "took" : 12,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "title" : "this is my first article"
        }
      }
    ]
  }
}

此时title没有匹配到任何的dynamic模板,默认就是standard分词器,不会过滤停用词,is会进入倒排索引,用is来搜索就可以搜索到。而title_en匹配到了dynamic模板,就是english分词器,会过滤停用词,is这种停用词就会被过滤掉,用is来搜索就搜索不到了。

基于scoll+bulk+索引别名实现零停机重建索引

1、重建索引

一个field的设置是不能被修改的,如果要修改一个field,那么应该重新按照新的mapping,建立一个index,然后将数据批量查询出来,重新用bulk api写入index中,批量查询的时候,建议采用scroll api,并且采用多线程并发的方式来reindex数据,每次scroll就查询指定日期的一段数据,交给一个线程即可。

(1)一开始,依靠dynamic mapping,插入数据,但是不小心有些数据是2017-01-01这种日期格式的,所以title的这种field被自动映射为了date类型,实际上它应该是string类型。

PUT /test_index/_doc/1
{
  "title": "2017-01-01"
}

GET /test_index/_mapping

{
  "test_index" : {
    "mappings" : {
      "properties" : {
        "title" : {
          "type" : "date"
        }
      }
    }
  }
}

(2)当后期向索引中加入string类型的title值的时候,就会报错

PUT /test_index/_doc/2
{
  "title": "my first article"
}

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "failed to parse field [title] of type [date] in document with id '2'"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "failed to parse field [title] of type [date] in document with id '2'",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "failed to parse date field [my first article] with format [strict_date_optional_time||epoch_millis]",
      "caused_by": {
        "type": "date_time_parse_exception",
        "reason": "Failed to parse with all enclosed parsers"
      }
    }
  },
  "status": 400
}

(3)如果此时想修改title的类型,是不可能的

PUT /test_index
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      }
    }
  }
}
{
  "error": {
    "root_cause": [
      {
        "type": "resource_already_exists_exception",
        "reason": "index [test_index/mZALkQ8IQV67SjCVqkhq4g] already exists",
        "index_uuid": "mZALkQ8IQV67SjCVqkhq4g",
        "index": "test_index"
      }
    ],
    "type": "resource_already_exists_exception",
    "reason": "index [test_index/mZALkQ8IQV67SjCVqkhq4g] already exists",
    "index_uuid": "mZALkQ8IQV67SjCVqkhq4g",
    "index": "test_index"
  },
  "status": 400
}

(4)此时,唯一的办法就是reindex,也就是说,重新建立一个索引,将旧索引的数据查询出来,在导入新索引

(5)如果说旧索引的名字是old_index,新索引的名字是new_index,终端应用,已经在使用old_index进行操作了,难到还要去终止应用,修改使用的index为new_index,在重新启动应用吗

(6)所以说此时应该采用别名的方式,给应用一个别名,这个别名指向旧索引,应用先用着。指向的还是旧索引

格式:

POST /_aliases
{
  "actions": [
    {
      "add": {
        "index": "test",
        "alias": "alias1"
      }
    }
  ]
}
POST /_aliases
{
  "actions": [
    {
      "add": {
        "index": "test_index",
        "alias": "test_index_alias"
      }
    }
  ]
}

(7)新建一个index,调整title为string

PUT /test_index_new
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      }
    }
  }
}

(8)使用scroll api将数据批量查询出来

GET /test_index/_search?scroll=1m
{
  "query": {
    "match_all": {}
  },
  "sort": [
    "_doc"
  ],
  "size": 1
}
{
  "_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAACz3UWUC1iLVRFdnlRT3lsTXlFY01FaEFwUQ==",
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "title" : "2017-01-01"
        },
        "sort" : [
          0
        ]
      }
    ]
  }
}
POST /_search/scroll
{
  "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAC0GYWUC1iLVRFdnlRT3lsTXlFY01FaEFwUQ==",
  "scroll": "1m"
}
{
  "_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAC0GYWUC1iLVRFdnlRT3lsTXlFY01FaEFwUQ==",
  "took" : 14,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

(9)采用bulk api将scroll查出来的一批数据,批量写入新索引

POST /_bulk
{"index": {"_index": "test_index_new", "_id": "1"}}
{"title": "2017-01-01"}

GET /test_index_new/_search
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_index_new",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "2017-01-01"
        }
      }
    ]
  }
}

(10)反复循环,查询一批又一批的数据出来,再批量写入新索引

(11)将test_index_alias切换到test_index_new上面去,应用会直接通过index别名使用新的索引中的数据,应用不需要停机,高可用

POST /_aliases
{
    "actions" : [
        { "remove" : { "index" : "test_index", "alias" : "test_index_alias" } },
        { "add" : { "index" : "test_index_new", "alias" : "test_index_alias" } }
    ]
}

GET /test_index_alias/_search

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_index_new",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "2017-01-01"
        }
      }
    ]
  }
}

2、基于alias对client客户端透明切换index

格式:

POST /_aliases
{
    "actions" : [
        { "remove" : { "index" : "test1", "alias" : "alias1" } },
        { "add" : { "index" : "test2", "alias" : "alias1" } }
    ]
}

注意actions里面的json一定不要换行,否则无法解析会报错


以上所述就是小编给大家介绍的《elasticsearch学习笔记(三十五)——Elasticsearch 索引管理》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

浅薄

浅薄

[美]尼古拉斯·卡尔 / 刘纯毅 / 中信出版社 / 2015-11 / 49.00 元

互联网时代的飞速发展带来了各行各业效率的提升和生活的便利,但卡尔指出,当我们每天在翻看手机上的社交平台,阅读那些看似有趣和有深度的文章时,在我们尽情享受互联网慷慨施舍的过程中,我们正在渐渐丧失深度阅读和深度思考的能力。 互联网鼓励我们蜻蜓点水般地从多种信息来源中广泛采集碎片化的信息,其伦理规范就是工业主义,这是一套速度至上、效率至上的伦理,也是一套产量最优化、消费最优化的伦理——如此说来,互......一起来看看 《浅薄》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

URL 编码/解码
URL 编码/解码

URL 编码/解码

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具