Elasticsearch-py 2.3版本的API翻译文档

她渐渐地笑了

0.125 2019-02-20 22:04 IP属地: 广东

Elasticsearch

API Documentation

Global options

一、Ignore

如果elasticsearch返回2XX响应，则API调用被视为成功（并将返回响应）。否则，将引发TransportError（或更具体的子类）的实例。您可以在“Exception”中查看其他异常和错误状态。如果您不希望引发异常，则始终可以使用应忽略的单个状态代码或其列表传递ignore参数：

from elasticsearch import Elasticsearch es = Elasticsearch() # ignore 400 cause by IndexAlreadyExistsException when creating an index es.indices.create(index='test-index', ignore=400) # ignore 404 and 400 es.indices.delete(index='test-index', ignore=[400, 404]) 二、Timeout 在构造客户端时可以设置全局超时（请参阅Connection的超时参数），或者在每个请求的基础上使用request_timeout（浮点值，以秒为单位）作为任何API调用的一部分来设置全局超时，此值将传递给执行的perform_request方法。 # only wait for 1 second, regardless of the client's default es.cluster.health(wait_for_status='yellow', request_timeout=1) 三、Response Filtering filter_path参数用于减少elasticsearch返回的响应。例如，要仅返回_id和_type，请执行以下操作： es.search(index='test-index', filter_path=['hits.hits._id', 'hits.hits._type']) 它还支持*通配符以匹配字段名称的任何字段或部分： es.search(index='test-index', filter_path=['hits.hits._*']) Elasticsearch class elasticsearch.Elasticsearch(hosts=None, transport_class=<class 'elasticsearch.transport.Transport'>, **kwargs) Elasticsearch低级客户端。提供从Python到ES REST端点的直接映射。该实例具有属性cat，cluster，indices，nodes和snapshot，分别提供对CatClient，ClusterClient，IndicesClient，NodesClient和SnapshotClient实例的访问。这是访问这些类及其方法的首选（也是唯一受支持的）方式。您可以通过提供connection_class参数来指定应该使用的自己的连接类： # create connection to localhost using the ThriftConnection es = Elasticsearch(connection_class=ThriftConnection) 如果你想打开嗅探，你有几个选项（在Transport中描述）： # create connection that will automatically inspect the cluster to get # the list of active nodes. Start with nodes running on 'esnode1' and # 'esnode2' es = Elasticsearch( ['esnode1', 'esnode2'], # sniff before doing anything sniff_on_start=True, # refresh nodes after a node fails to respond sniff_on_connection_fail=True, # and also every 60 seconds sniffer_timeout=60 不同的主机可以有不同的参数，每个节点使用一个字典来指定： # connect to localhost directly and another node using SSL on port 443 # and an url_prefix. Note that ``port`` needs to be an int. es = Elasticsearch([ {'host': 'localhost'}, {'host': 'othernode', 'port': 443, 'url_prefix': 'es', 'use_ssl': True}, 如果使用SSL，有几个参数可以控制我们处理证书的方式（有关选项的详细说明，请参阅Urllib3HttpConnection）： es = Elasticsearch( ['localhost:443', 'other_host:443'], # turn on SSL use_ssl=True, # make sure we verify SSL certificates (off by default) verify_certs=True, # provide a path to CA certs on disk ca_certs='/path/to/CA_certs', # PEM formatted SSL client certificate client_cert='/path/to/clientcert.pem', # PEM formatted SSL client key client_key='/path/to/clientkey.pem' 或者，您可以使用RFC-1738格式的URL，只要它们与其他选项不冲突： es = Elasticsearch( 'http://user:secret@localhost:9200/', 'https://user:secret@other_host:443/production' verify_certs=True hosts 我们应该连接的节点列表。节点应该是一个字典（{“host”：“localhost”，“port”：9200}），整个字典将作为kwargs传递给Connection类，或者是一个主机[：port]格式的字符串, 被自动翻译成字典。如果没有给出值，将使用Urllib3HttpConnection类的默认值。 transport_class Transport 的子类被使用。 kwargs 任何其他参数将传递给Transport类，并随后传递给Connection实例。 bulk(*args, **kwargs) 在单个API调用中执行许多索引/删除操作。此包的操作引发的所有异常的基类（不适用于ImproperlyConfigured）。 class elasticsearch.SerializationError(ElasticsearchException) 传入的数据无法在正在使用的Serializer中正确序列化。 class elasticsearch.TransportError(ElasticsearchException) ES返回非OK（> = 400）HTTP状态代码时引发异常。或者发生实际连接错误时;在这种情况下，status_code将设置为“N / A”。与ES交谈时出现异常时出错。基础Connection实现的原始异常以.info的形式提供。 class elasticsearch.ConnectionTimeout(ConnectionError) 网络超时。默认情况下不会导致节点重试。 class elasticsearch.SSLError(ConnectionError) 遇到SSL错误时出错。 class elasticsearch.NotFoundError(TransportError) 表示404状态代码的异常。 class elasticsearch.ConflictError(TransportError) 表示409状态代码的异常。 class elasticsearch.RequestError(TransportError) 表示400状态代码的异常。 class elasticsearch.ConnectionError(TransportError) 与ES连接时出现异常时出错。基础Connection实现的原始异常以.info的形式提供。 Connection Layer API Transport classes 可以使用的传输类列表，只需导入您的选择并将其作为connection_class传递给Elasticsearch的构造函数。请注意，RequestsHttpConnection需要安装 requests。例如，要使用基于请求的连接，只需导入并使用它： from elasticsearch import Elasticsearch, RequestsHttpConnection es = Elasticsearch(connection_class=RequestsHttpConnection) Connection class elasticsearch.connection.Connection(host='localhost', port=9200, url_prefix='', timeout=10, **kwargs) 负责维护与Elasticsearch节点的连接的类。它拥有持久连接池，它的主界面（perform_request）是线程安全的。还负责记录。 Urllib3HttpConnection class elasticsearch.connection.Urllib3HttpConnection(host='localhost', port=9200, http_auth=None, use_ssl=False, verify_certs=False, ca_certs=None, client_cert=None, client_key=None, ssl_version=None, ssl_assert_hostname=None, ssl_assert_fingerprint=None, maxsize=10, **kwargs) 使用urllib3库和http协议的默认连接类。 ca_certs CA bundle的可选路径。有关如何获取默认设置的说明，请参见http://urllib3.readthedocs.org/en/latest/security.html#using certifi urllib3 client_cert 包含私钥和证书的文件的路径，或仅在使用client_key时的cert client_key 如果使用单独的证书和密钥文件，则包含私钥的文件的路径（client_cert将仅包含证书） ssl_version 要使用的SSL协议的版本。选项包括：SSLv23（默认）SSLv2 SSLv3 TLSv1（有关您环境的确切选项，请参阅ssl模块中的PROTOCOL_ *常量）。 ssl_assert_hostname 如果不是False，则使用主机名验证 ssl_assert_fingerprint 如果不是，则验证提供的证书指纹 maxsize 将保持对此主机开放的最大连接数。 RequestsHttpConnection class elasticsearch.connection.RequestsHttpConnection(host='localhost', port=9200, http_auth=None, use_ssl=False, verify_certs=False, ca_certs=None, client_cert=None, client_key=None, **kwargs) 使用请求库进行连接。批量API有几个帮助程序，因为它需要特定的格式化，如果直接使用，其他注意事项会使它变得很麻烦。所有批量助手都接受Elasticsearch类的实例和可迭代的 action（任何可迭代的，也可以是生成器，这在大多数情况下是理想的，因为它允许您索引大型数据集而无需将它们加载到内存中）。 action 可迭代中的项应该是我们希望以多种格式索引的文档。最常见的一个与search（）返回的相同，例如： '_index': 'index-name', '_type': 'document', '_id': 42, '_parent': 5, '_ttl': '1d', '_source': { "title": "Hello World!", "body": "..." 另外，如果_source不存在，它将弹出doc中的所有元数据字段，并将其余部分用作文档数据： "_id": 42, "_parent": 5, "title": "Hello World!", "body": "..." bulk（）api接受 index，create，delete 和 update 操作。使用_op_type字段指定操作（_op_type默认为 index ）： '_op_type': 'delete', '_index': 'index-name', '_type': 'document', '_id': 42, '_op_type': 'update', '_index': 'index-name', '_type': 'document', '_id': 42, 'doc': {'question': 'The life, universe and everything.'} 从文件中读取原始json字符串时，您也可以直接传递它们（不先解码为dicts）。但是，在这种情况下，您无法在每个记录的基础上指定任何内容（索引，类型，甚至ID），所有文档都将被发送到elasticsearch以按原样索引。 elasticsearch.helpers.streaming_bulk(client, actions, chunk_size=500, max_chunk_bytes=104857600, raise_on_error=True, expand_action_callback=<function expand_action>, raise_on_exception=True, **kwargs) Streaming bulk消耗传入的iterable中的操作，并产生每个操作的结果。对于非流式用例，使用bulk（），它是流式批量的包装器，一旦整个输入被消耗和发送，就会返回有关批量操作的摘要信息。 elasticsearch.helpers.parallel_bulk(client, actions, thread_count=4, chunk_size=500, max_chunk_bytes=104857600, expand_action_callback=<function expand_action>, **kwargs) 批量助手的并行版本一次在多个线程中运行。 bulk（）api的助手，它提供了一个更友好的人机界面 - 它消耗了一个 action 迭代器，并将它们以块的形式发送给elasticsearch。它返回一个包含摘要信息的元组 - 成功执行的操作数，如果stats_only设置为True，则显示错误列表或错误数。有关更多可接受的参数，请参阅streaming_bulk（） elasticsearch.helpers.scan(client, query=None, scroll=u'5m', raise_on_error=True, preserve_order=False, **kwargs) 在scroll（）api之上的简单抽象 - 一个简单的迭代器，它产生由下划线滚动请求返回的所有命中。默认情况下，扫描不会以任何预定顺序返回结果。要在滚动时在返回的文档中使用标准顺序（通过分数或显式排序定义），请使用preserve_order = True。这可能是一项昂贵的操作，并且会抵消使用扫描的性能优势。 Reindex elasticsearch.helpers.reindex(client, source_index, target_index, query=None, target_client=None, chunk_size=500, scroll=u'5m', scan_kwargs={}, bulk_kwargs={}) 将一个满足给定查询的索引中的所有文档重新索引到另一个索引，可能（如果指定了target_client）在另一个集群上。如果未指定查询，则将重新索引所有文档。这个帮助器不会传输mappings，只传输数据。