Elasticsearch/Python - 更改映射后重新索引数据?

4 人关注

我对如何在映射或数据类型改变后,在弹性搜索中重新索引数据感到有点困难。

根据弹性搜索文档

从你的旧索引中提取文件,使用滚动搜索,并使用批量API将它们索引到新的索引中。许多客户端的API提供了一个reindex()方法,它将为你完成所有这些工作。一旦你完成了,你可以删除旧的索引。

这是我的旧地图

"test-index2": { "mappings": { "business": { "properties": { "address": { "type": "nested", "properties": { "country": { "type": "string" "full_address": { "type": "string"

New Index mapping, I'm changing full_address -> location_address

"test-index2": { "mappings": { "business": { "properties": { "address": { "type": "nested", "properties": { "country": { "type": "string" "location_address": { "type": "string"

我正在使用elasticsearch的python客户端

https://elasticsearch-py.readthedocs.org/en/master/helpers.html#elasticsearch.helpers.reindex

from elasticsearch import Elasticsearch
from elasticsearch.helpers import reindex
es = Elasticsearch(["es.node1"])
reindex(es, "source_index", "target_index")

然而,这将数据从一个索引转移到另一个索引。

我如何使用它来改变上述案例中的映射/(数据类型等)?

python
elasticsearch
wolfgang
wolfgang
发布于 2015-08-29
3 个回答
wolfgang
wolfgang
发布于 2020-11-24
已采纳
0 人赞同

It's Straightforward if you use the scan&scroll and the Bulk API already implemented in the python client of elasticsearch

First -> Fetch all the documents by scan&scroll method

循环浏览并对每个文件进行必要的修改

使用批量API将修改后的文件插入到新的索引中

from elasticsearch import Elasticsearch, helpers
es = Elasticsearch()
# Use the scan&scroll method to fetch all documents from your old index
res = helpers.scan(es, query={
  "query": {
    "match_all": {}
  "size":1000 
},index="old_index")
new_insert_data = []
# Change the mapping and everything else by looping through all your documents
for x in res:
    x['_index'] = 'new_index'
    # Change "address" to "location_address"
    x['_source']['location_address'] = x['_source']['address']
    del x['_source']['address']
    # This is a useless field
    del x['_score']
    es.indices.refresh(index="testing_index3")
    # Add the new data into a list
    new_insert_data.append(x)
print new_insert_data
#Use the Bulk API to insert the list of your modified documents into the database
helpers.bulk(es,new_insert_data)
    
bittusarkar
bittusarkar
发布于 2020-11-24
0 人赞同

替换代码0】的API只是将文件从一个索引 "移动 "到另一个索引。它没有办法检测/推断出旧索引的文档中的字段名 full_address 在新索引的文档中应该是 location_address 。我怀疑标准Elasticsearch客户端提供的任何API都能做到你所希望的。我能想到的唯一方法是通过客户端的额外自定义逻辑来实现,该逻辑维护一个从旧索引到新索引的字段名的字典,然后从旧索引中读取文档,用从字段名字典中获得的新字段名将相应的文档索引到新索引。

Chitra
Chitra
发布于 2020-11-24
0 人赞同