欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

elasticsearch操作

程序员文章站 2022-07-09 18:52:13
...

添加

类型为employee,该类型位于索引megacorg,每个雇员索引一个文档,该文档包含该雇员的全部信息(面向文档),该雇员的id为1

需要index、type、id

curl -X PUT -H 'Content-Type: application/json' -i http://focuson1:9200/megacorp/employee/1 --data '{
   "first_name": "John",
   "last_name": "Smith",
   "age": 25,
   "about": "I love to go rock climbing",
   "interests": [
      "sports",
      "music"
   ]
}'

添加更多的雇员
curl -X PUT -H 'Content-Type: application/json' -i http://focuson1:9200/megacorp/employee/2 --data '{
    "first_name" :  "Jane",
    "last_name" :   "Smith",
    "age" :         32,
    "about" :       "I like to collect rock albums",
    "interests":  [ "music" ]
}'

增加索引的时候,默认会有5个主分片,主分片是在创建索引的时候就要固定的,而副本分片个数随时可修改,比如,创建一个主分片为3,副本分片为1的索引。当往es中put数据时,会按照id进行hash,然后put到对应的分片上。

[[email protected] ~]# curl -X PUT -H 'Content-Type: application/json' -i http://focuson1:9200/megacorp2 --data '{
>    "settings" : {
>       "number_of_shards" : 3,
>       "number_of_replicas" : 1
>    }
> }'
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 68

{"acknowledged":true,"shards_acknowledged":true,"index":"megacorp2"}

往es添加数据时,也可以不指定id,会自动创建id,需要使用post请求,方式如下:

[[email protected] ~]# curl -X POST -H 'Content-Type: application/json' -i http://focuson1:9200/megacorp/employee/ --data '{
>    "first_name": "John2",
>    "last_name": "Smith2",
>    "age": 256,
>    "about": "I love to go rock climbing",
>    "interests": [
>       "sports",
>       "music"
>    ]
> }'
HTTP/1.1 201 Created
Location: /megacorp/employee/TfQ8ymMBtknNDl0i3mwi
content-type: application/json; charset=UTF-8
content-length: 179

{"_index":"megacorp","_type":"employee","_id":"TfQ8ymMBtknNDl0i3mwi","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":3,"_primary_term":1}

更新文档时,

和添加时是一样的,返回一个version,是一个不同于之前的version。更新时,elasticsearch将旧的文档标记为已删除,并增加一个全新的文档,旧的文档会在后台自动清除,但是不会立即清除。

创建文档

返回409,代表已存在,不能创建,如果不加op_type=create,会更新。也可以在URL最后加上/_create
[[email protected] ~]# curl -X PUT -H 'Content-Type: application/json' -i http://focuson1:9200/megacorp/employee/1?op_type=create --data '{
>    "first_name": "John",
>    "last_name": "Smith",
>    "age": 25,
>    "about": "I love to go rock climbing",
>    "interests": [
>       "sports",
>       "music"
>    ]
> }'
HTTP/1.1 409 Conflict
content-type: application/json; charset=UTF-8
content-length: 445

{"error":{"root_cause":[{"type":"version_conflict_engine_exception","reason":"[employee][1]: version conflict, document already exists (current version [4])","index_uuid":"hKhKh3YRT6yRiWQiBPSYuw","shard":"3","index":"megacorp"}],"type":"version_conflict_engine_exception","reason":"[employee][1]: version conflict, document already exists (current version [4])","index_uuid":"hKhKh3YRT6yRiWQiBPSYuw","shard":"3","index":"megacorp"},"status":409}

检索文档:

  • 根据需要index、type、id,返回某个文档
[[email protected] ~]# curl -X GET -i http://focuson1:9200/megacorp/employee/1
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 249

{"_index":"megacorp","_type":"employee","_id":"1","_version":1,"found":true,"_source":{
   "first_name": "John",
   "last_name": "Smith",
   "age": 25,
   "about": "I love to go rock climbing",
   "interests": [
      "sports",
      "music"
   ]
}}
  • pretty
在请求参数中加上pretty,会使返回更加可读,但是source不会,会按照我们添加时候的格式返回
[[email protected] ~]# curl -X GET -i http://focuson1:9200/megacorp/employee/1?pretty
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 294

{
  "_index" : "megacorp",
  "_type" : "employee",
  "_id" : "1",
  "_version" : 3,
  "found" : true,
  "_source" : {
    "first_name" : "John",
    "last_name" : "Smith",
    "age" : 25,
    "about" : "I love to go rock climbing",
    "interests" : [
      "sports",
      "music"
    ]
  }
}
  • 返回部分字段
只返回部分字段
[[email protected] ~]# curl -X GET -i http://focuson1:9200/megacorp/employee/1?_source=first_name,last_name
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 128

{"_index":"megacorp","_type":"employee","_id":"1","_version":3,"found":true,"_source":{"last_name":"Smith","first_name":"John"}}
  • 只返回source部分
只返回source里面的值
[[email protected] ~]# curl -X GET -i http://focuson1:9200/megacorp/employee/1/_source
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 162

{
   "first_name": "John",
   "last_name": "Smith",
   "age": 25,
   "about": "I love to go rock climbing",
   "interests": [
      "sports",
      "music"
   ]
}

  • 返回全部文档,默认最多十个:
 /_search
    在所有的索引中搜索所有的类型 
/gb/_search
    在 gb 索引中搜索所有的类型 
/gb,us/_search
    在 gb 和 us 索引中搜索所有的文档 
/g*,u*/_search
    在任何以 g 或者 u 开头的索引中搜索所有的类型 
/gb/user/_search
    在 gb 索引中搜索 user 类型 
/gb,us/user,tweet/_search
    在 gb 和 us 索引中搜索 user 和 tweet 类型 
/_all/user,tweet/_search
    在所有的索引中搜索 user 和 tweet 类型 
//该例子是返回索引为megacorp,类型为employee的全部文档
[[email protected] ~]# curl -X GET -i http://focuson1:9200/megacorp/employee/_search
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 611

{"took":2,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":2,"max_score":1.0,"hits":[{"_index":"megacorp","_type":"employee","_id":"2","_score":1.0,"_source":{
    "first_name" :  "Jane",
    "last_name" :   "Smith",
    "age" :         32,
    "about" :       "I like to collect rock albums",
    "interests":  [ "music" ]
}},{"_index":"megacorp","_type":"employee","_id":"1","_score":1.0,"_source":{
   "first_name": "John",
   "last_name": "Smith",
   "age": 25,
   "about": "I love to go rock climbing",
   "interests": [
      "sports",
      "music"
   ]
}}]}}
  • 分页

GET /_search?size=5
GET /_search?size=5&from=5
GET /_search?size=5&from=10
分页会在每个分片进行排序然后返回,分页过深会使成本成指数上升

全文搜索,

返回与该词相关的文档,并返回相关系数

写法一,这样在URL中写不能使用空格等特殊符号:

curl -X GET -i 'http://focuson1:9200/megacorp/employee/_search?q=about:like'
查询条件前面+表示前缀必须与可选条件匹配,-标示前缀一定不与查询条件匹配,没有+-就是其他情况。
http://focuson1:9200/megacorp/employee/_search?q=-about:to%20go

写法二,使用match:

[[email protected] ~]# curl -X GET -i http://focuson1:9200/megacorp/employee/_search  -H 'Content-Type: application/json'  --data '{
>     "query" : {
>         "match" : {
>             "about" : "rock climbing"
>         }
>     }
> }'
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 629

{"took":6,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":2,"max_score":0.5753642,"hits":[{"_index":"megacorp","_type":"employee","_id":"1","_score":0.5753642,"_source":{
   "first_name": "John",
   "last_name": "Smith",
   "age": 25,
   "about": "I love to go rock climbing",
   "interests": [
      "sports",
      "music"
   ]
}},{"_index":"megacorp","_type":"employee","_id":"2","_score":0.2876821,"_source":{
    "first_name" :  "Jane",
    "last_name" :   "Smith",
    "age" :         32,
    "about" :       "I like to collect rock albums",
    "interests":  [ "music" ]
}}]}}

短语搜索,

match_phrase只搜索使用这个短语的

[[email protected] ~]# curl -X GET "localhost:9200/megacorp/employee/_search" -H 'Content-Type: application/json' -d'
> {
>     "query" : {
>         "match_phrase" : {
>             "about" : "rock climbing"
>         }
>     }
> }'
{"took":4,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":1,"max_score":0.5753642,"hits":[{"_index":"megacorp","_type":"employee","_id":"1","_score":0.5753642,"_source":{
   "first_name": "John",
   "last_name": "Smith",
   "age": 25,
   "about": "I love to go rock climbing",
   "interests": [
      "sports",
      "music"
   ]
}}]}}

高亮搜索,

让用户知道为何匹配到该文档,在json请求和返回中会有highlight部分

[[email protected] ~]# curl -X GET "localhost:9200/megacorp/employee/_search" -H 'Content-Type: application/json' -d'
> {
>     "query" : {
>         "match_phrase" : {
>             "about" : "rock climbing"
>         }
>     },
>     "highlight": {
>         "fields" : {
>             "about" : {}
>         }
>     }
> }
> '
{"took":6,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":1,"max_score":0.5753642,"hits":[{"_index":"megacorp","_type":"employee","_id":"1","_score":0.5753642,"_source":{
   "first_name": "John",
   "last_name": "Smith",
   "age": 25,
   "about": "I love to go rock climbing",
   "interests": [
      "sports",
      "music"
   ]
},"highlight":{"about":["I love to go <em>rock</em> <em>climbing</em>"]}}]}}

聚合,分析。

查询last_name为Smith,年龄大于30(gt表示grant_than大于)

[[email protected] ~]# curl -X GET -i http://focuson1:9200/megacorp/employee/_search -H 'Content-Type: application/json' --data '{
>     "query" : {
>         "bool": {
>             "must": {
>                 "match" : {
>                     "last_name" : "smith" 
>                 }
>             },
>             "filter": {
>                 "range" : {
>                     "age" : { "gt" : 30 } 
>                 }
>             }
>         }
>     }
> }'
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 388

{"took":153,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":1,"max_score":0.2876821,"hits":[{"_index":"megacorp","_type":"employee","_id":"2","_score":0.2876821,"_source":{
    "first_name" :  "Jane",
    "last_name" :   "Smith",
    "age" :         32,
    "about" :       "I like to collect rock albums",
    "interests":  [ "music" ]
}}]}}

删除文档和索引。不会立即删除,只会标记为删除状态。

删除文档
curl -X DELETE -i http://focuson1:9200/megacorp/employee/1
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 160

{"_index":"megacorp","_type":"employee","_id":"1","_version":5,"result":"deleted","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":5,"_primary_term":2}
删除索引
[[email protected] ~]# curl -X DELETE -i http://focuson1:9200/megacorp
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 21

{"acknowledged":true}

查看集群健康状况

[[email protected] ~]# curl http://focuson1:9200/_cluster/health
{"cluster_name":"elasticsearch","status":"yellow","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":5,"active_shards":5,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":5,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":50.0}

更新丢失问题

在数据库层面,存在悲观锁和乐观锁,悲观锁是认为每次更新都存在更新丢失的可能性,会在每次读取数据之后就加上锁,其他就不能再操作了,知道锁释放之后,别的线程才能操作;乐观锁认为在每次读取时都不存在更新丢失的问题,但是会有一个版本号,查询时查得这个版本号,在更新时,查得该版本号并更新他,发现被别人更新时,就不再更新,这样也能方式更新丢失;所以乐观锁效率更高。

而elasticsearch明显可以使用乐观锁,因为他里面有版本号。比如在web界面加载所有的es里信息时,每条信息都有版本号,更新或删除时,会在条件中加上版本号为加载时的版本号,如果不是,则更新失败。

例子如下:

[[email protected] ~]# curl -X GET  http://focuson1:9200/megacorp/employee/2
{"_index":"megacorp","_type":"employee","_id":"2","_version":2,"found":true,"_source":{  
    "first_name" :  "Jane",  
    "last_name" :   "Smith",  
    "age" :         32,  
    "about" :       "I like to collect rock albums",  
    "interests":  [ "music" ]  
}}

查得该条数据版本为2,则更新该条数据时,加上在version=2的基础上更新,如下:

[[email protected] ~]# curl -X PUT -H 'Content-Type: application/json' http://focuson1:9200/megacorp/employee/2?version=2 --data '{  
>     "first_name" :  "Jane3",  
>     "last_name" :   "Smith3",  
>     "age" :         32,  
>     "about" :       "I like to collect rock albums",  
>     "interests":  [ "music" ]  
> }' 
{"_index":"megacorp","_type":"employee","_id":"2","_version":3,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":2,"_primary_term":1}

此时version变成了3,如果再使用version=2更新,则会失败,返回状态409失败:

[[email protected] ~]# curl -X PUT -H 'Content-Type: application/json' http://focuson1:9200/megacorp/employee/2?version=2 --data '{  
    "first_name" :  "Jane3",  
    "last_name" :   "Smith3",  
    "age" :         32,  
    "about" :       "I like to collect rock albums",  
    "interests":  [ "music" ]  
}' 
{"error":{"root_cause":[{"type":"version_conflict_engine_exception","reason":"[employee][2]: version conflict, current version [3] is different than the one provided [2]","index_uuid":"FeUwsg9lTPuFTABIuT77BQ","shard":"2","index":"megacorp"}],"type":"version_conflict_engine_exception","reason":"[employee][2]: version conflict, current version [3] is different than the one provided [2]","index_uuid":"FeUwsg9lTPuFTABIuT77BQ","shard":"2","index":"megacorp"},"status":409}

使用外部的版本号

新增文档时:

如果该版本号比123小,则更新成123,如果比123大或等于,则返回409
[[email protected] ~]# curl -X PUT -H 'Content-Type: application/json' 'http://focuson1:9200/megacorp/employee/2?version=123&version_type=external' --data '{  
>     "first_name" :  "Jane",  
>     "last_name" :   "Smith",  
>     "age" :         32,  
>     "about" :       "I like to collect rock albums",  
>     "interests":  [ "music" ]  
> }'  
{"_index":"megacorp","_type":"employee","_id":"2","_version":123,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":4,"_primary_term":1}

文档部分更新

在doc内添加一些字段,存在的更新,不存在的新增
[[email protected] ~]# curl -X POST -H 'Content-Type: application/json' 'http://focuson1:9200/megacorp/employee/2/_update' --data '{  
>    "doc" : {
>       "tags" : [ "testing" ],
>       "views": 0
>    }
> }'  
{"_index":"megacorp","_type":"employee","_id":"2","_version":125,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":6,"_primary_term":1}
使用脚本部分更新文档,把nimei字段加1
[[email protected] ~]# curl -X POST "http://focuson1:9200/megacorp/employee/2/_update" -H 'Content-Type: application/json' -d'
> {
>    "script" : "ctx._source.nimei+=1"
> }
> '
{"_index":"megacorp","_type":"employee","_id":"2","_version":127,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":8,"_primary_term":1}[[email protected] ~]# 
[[email protected] ~]# 
[[email protected] ~]# curl http://focuson1:9200/megacorp/employee/2?pretty
{
  "_index" : "megacorp",
  "_type" : "employee",
  "_id" : "2",
  "_version" : 127,
  "found" : true,
  "_source" : {
    "doc" : {
      "tags" : [
        "testing"
      ],
      "views" : 0
    },
    "views" : 0,
    "tags" : [
      "testing"
    ],
    "nimei" : 1234567891
  }
}

upsert更新的文档不存在先创建他

[[email protected] ~]# curl -X POST -H 'Content-Type: application/json' 'http://focuson1:9200/megacorp/employee/100/_update' --data '{  
>    "doc" : {
>       "tags" : [ "testing" ],
>       "views": 0,
>       "nimei":1234567890
>    },
>    "upsert": {}
> }' 
{"_index":"megacorp","_type":"employee","_id":"100","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}[[email protected] ~]# 
[[email protected] ~]# 
//下面结果可以看出,不存在会创建,但是不能把doc里面的进行更新
[[email protected] ~]# curl http://focuson1:9200/megacorp/employee/100
{"_index":"megacorp","_type":"employee","_id":"100","_version":1,"found":true,"_source":{}}[[email protected] ~]# 
[[email protected] ~]# 
[[email protected] ~]# 
[[email protected] ~]# curl -X POST -H 'Content-Type: application/json' 'http://focuson1:9200/megacorp/employee/100/_update' --data '{  
>    "doc" : {
>       "tags" : [ "testing" ],
>       "views": 0,
>       "nimei":1234567890
>    },
>    "upsert": {}
> }' 
{"_index":"megacorp","_type":"employee","_id":"100","_version":2,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":1,"_primary_term":1}[[email protected] ~]# 
[[email protected] ~]# 
[[email protected] ~]# curl http://focuson1:9200/megacorp/employee/100
{"_index":"megacorp","_type":"employee","_id":"100","_version":2,"found":true,"_source":{"nimei":1234567890,"views":0,"tags":["testing"]}}

更新重试

在程序中,我们可以使用乐观锁控制,每次传入version,这样就不会存在冲突的情况,但是当我们不存入version时,每次更新时会先检索,拿出version,然后重建索引,此时,可能会存在冲突。此时可以通过一个参数重试。retry_on_conflict,默认是0次。

curl -X POST "localhost:9200/website/pageviews/1/_update?retry_on_conflict=5" -H 'Content-Type: application/json' -d'
{
   "script" : "ctx._source.views+=1",
   "upsert": {
       "views": 0
   }
}
'

取回多个文档

[[email protected] ~]# curl -X GET -H 'Content-Type: application/json' 'http://focuson1:9200/_mget' --data '{
>    "docs" : [
>       {
>          "_index" : "megacorp",
>          "_type" :  "employee",
>          "_id" :    1
>       },
>       {
>          "_index" : "megacorp",
>          "_type" :  "employee",
>          "_id" :    2,
>          "_source": "first_name"
>       }
>    ]
> }'
{"docs":[{"_index":"megacorp","_type":"employee","_id":"1","_version":1,"found":true,"_source":{  
   "first_name": "John",  
   "last_name": "Smith",  
   "age": 25,  
   "about": "I love to go rock climbing",  
   "interests": [  
      "sports",  
      "music"  
   ]  
}},{"_index":"megacorp","_type":"employee","_id":"2","_version":127,"found":true,"_source":{}}]}
如果在一个index或一个type中,可以把index或type写到URL中
[[email protected] ~]# curl -X GET -H 'Content-Type: application/json' 'http://focuson1:9200/megacorp/employee/_mget' --data '{
>    "docs" : [
>       {
>          "_id" :    1
>       },
>       {
>          "_id" :    2,
>          "_source": "first_name"
>       }
>    ]
> }'
{"docs":[{"_index":"megacorp","_type":"employee","_id":"1","_version":1,"found":true,"_source":{  
   "first_name": "John",  
   "last_name": "Smith",  
   "age": 25,  
   "about": "I love to go rock climbing",  
   "interests": [  
      "sports",  
      "music"  
   ]  
}},{"_index":"megacorp","_type":"employee","_id":"2","_version":127,"found":true,"_source":{}}]}
批量操作(bulk)

有下面几个动作:create(创建文档)、index(创建一个文档或替换一个现有文档)、update(更新文档)、delete

例子如下:

[[email protected] ~]# curl -X POST "http://focuson1:9200/_bulk" -H 'Content-Type: application/json' -d'
> { "delete": { "_index": "megacorp", "_type": "employee", "_id": "123" }} 
> { "create": { "_index": "megacorp", "_type": "employee", "_id": "123" }}
> { "title":    "My first blog post" }
> { "index":  { "_index": "megacorp", "_type": "employee" }}
> { "title":    "My second blog post" }
> { "update": { "_index": "megacorp", "_type": "employee", "_id": "123", "_retry_on_conflict" : 3} }
> { "doc" : {"title" : "My updated blog post"} }
> '
{"took":87,"errors":false,"items":[{"delete":{"_index":"megacorp","_type":"employee","_id":"123","_version":1,"result":"not_found","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1,"status":404}},{"create":{"_index":"megacorp","_type":"employee","_id":"123","_version":2,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":1,"_primary_term":1,"status":201}},{"index":{"_index":"megacorp","_type":"employee","_id":"8iZmy2MBAdBddqEKxy1b","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":9,"_primary_term":1,"status":201}},{"update":{"_index":"megacorp","_type":"employee","_id":"123","_version":3,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":2,"_primary_term":1,"status":200}}]}
这样有一个问题,每一个操作都要制定index、type,这样有点过分,可以在URL中制定index和type,这样,在每个操作中默认使用URL中的,如果自己指定,那么使用自己的。
[[email protected] ~]# curl -X POST "http://focuson1:9200/megacorp/employee/_bulk" -H 'Content-Type: application/json' -d'
> { "delete": { "_id": "123" }} 
> { "create": { "_id": "123" }}
> { "title":    "My first blog post" }
> { "index":  {}}
> { "title":    "My second blog post" }
> { "update": {"_id": "123", "_retry_on_conflict" : 3} }
> { "doc" : {"title" : "My updated blog post"} }
> '
{"took":31,"errors":false,"items":[{"delete":{"_index":"megacorp","_type":"employee","_id":"123","_version":4,"result":"deleted","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":3,"_primary_term":1,"status":200}},{"create":{"_index":"megacorp","_type":"employee","_id":"123","_version":5,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":4,"_primary_term":1,"status":201}},{"index":{"_index":"megacorp","_type":"employee","_id":"8yZpy2MBAdBddqEKSi1M","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":10,"_primary_term":1,"status":201}},{"update":{"_index":"megacorp","_type":"employee","_id":"123","_version":6,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":5,"_primary_term":1,"status":200}}]}




相关标签: elasticsearch