欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

跟益达学Solr5之批量索引JSON数据

程序员文章站 2022-06-22 17:08:57
...

        假定你有这样一堆JSON数据,

 

[
  {"id":"1", "name":"Red Lobster", "city":"San Francisco, CA", "type":"Sit-down Chain", "state":"California", "tags":["sea food", "sit down"], "price":33.00},
  {"id":"2", "name":"Red Lobster", "city":"Atlanta, GA", "type":"Sit-down Chain", "state":"Georgia", "tags":["sea food", "sit-down"], "price":22.00},
  {"id":"3", "name":"Red Lobster", "city":"New York, NY", "type":"Sit-down Chain", "state":"New York", "tags":["sea food", "sit-down"], "price":29.00},
  {"id":"4", "name":"McDonalds", "city":"San Francisco, CA", "type":"Fast Food", "state":"California", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":9.00},
  {"id":"5", "name":"McDonalds", "city":"Atlanta, GA", "type":"Fast Food", "state":"Georgia", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00},
  {"id":"6", "name":"McDonalds", "city":"New York, NY", "type":"Fast Food", "state":"New York", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00},
  {"id":"7", "name":"McDonalds", "city":"Chicago, IL", "type":"Fast Food", "state":"Illinois", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00},
  {"id":"8", "name":"McDonalds", "city":"Austin, TX", "type":"Fast Food", "state":"Texas", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00},
  {"id":"9", "name":"Pizza Hut", "city":"Atlanta, GA", "type":"Sit-down Chain", "state":"Georgia", "tags":["pizza", "sit-down", "delivery"], "price":15.00},
  {"id":"10", "name":"Pizza Hut", "city":"New York, NY", "type":"Sit-down Chain", "state":"New York", "tags":["pizza", "sit-down", "delivery"], "price":24.00},
  {"id":"11", "name":"Pizza Hut", "city":"Austin, TX", "type":"Sit-down Chain", "state":"Texas", "tags":["pizza", "sit-down", "delivery"], "price":18.00},
  {"id":"12", "name":"Freddy's Pizza Shop", "city":"Los Angeles, CA", "type":"Local Sit-down", "state":"California", "tags":["pizza", "pasta", "sit-down"], "price":25.00},
  {"id":"13", "name":"The Iberian Pig", "city":"Atlanta, GA", "type":"Upscale", "state":"Georgia", "tags":["spanish", "tapas", "sit-down", "upscale"], "price":45.00},
  {"id":"14", "name":"Sprig", "city":"Atlanta, GA", "type":"Local Sit-down", "state":"Georgia", "tags":["sit-down", "gluten-free", "southern cuisine"], "price":15.00},
  {"id":"15", "name":"Starbucks", "city":"San Francisco, CA", "type":"Coffee Shop", "state":"California", "tags":["coffee", "breakfast"], "price":7.50},
  {"id":"16", "name":"Starbucks", "city":"Atlanta, GA", "type":"Coffee Shop", "state":"Georgia", "tags":["coffee", "breakfast"], "price":4.00},
  {"id":"17", "name":"Starbucks", "city":"New York, NY", "type":"Coffee Shop", "state":"New York", "tags":["coffee", "breakfast"], "price":6.50},
  {"id":"18", "name":"Starbucks", "city":"Chicago, IL", "type":"Coffee Shop", "state":"Illinois", "tags":["coffee", "breakfast"], "price":6.00},
  {"id":"19", "name":"Starbucks", "city":"Austin, TX", "type":"Coffee Shop", "state":"Texas", "tags":["coffee", "breakfast"], "price":5.00},
  {"id":"20", "name":"Starbucks", "city":"Greenville, SC", "type":"Coffee Shop", "state":"South Carolina", "tags":["coffee", "breakfast"], "price":3.00}
]

   你想导入到Solr中进行索引,怎么办?其实Solr的Web UI界面就可以操作,在左侧有个Documents菜单,表示导入Document(当然也支持Document更新)的意思,Document加个s即表示支持批量导入多个Document,如图:
跟益达学Solr5之批量索引JSON数据
            
    
    博客分类: Solr SolrJSONImport 
 Document Type即表示你的Document数据来源是什么,是来自于JSON,来自于XML,来自于CVS等等,

 

 Commit Within表示document提交必须在指定的毫秒数内完成,否则提交操作视为超时;

 Overwriter即表示是否覆盖索引目录下已有的索引数据,设置为false即表示不覆盖已有索引只在原来的基础上追加索引数据;

 Boost:表示设置Document的权重,默认值为1.0;

 如果你只是单个JSON对象需要导入,那直接选择Document Type为JSON即可,当你选择Document Type为JSON后,Document(s)文本框会提示一个示例,如图:
跟益达学Solr5之批量索引JSON数据
            
    
    博客分类: Solr SolrJSONImport 
 当然你也可以选择
Document Type为Solr Command(raw XML or JSON),只不过这时候JSON数据格式就有特殊要求了,你的JSON数据格式需要这样定义:

{
    "add": {
        "doc": {.......}
    },
    "add": {
        "doc": {.......}
    },
    "add": {
        "doc": {.......}
    },
    "add": {
        "doc": {.......}
    },
    "add": {
        "doc": {.......}
    },
   ............. and so on.
}

    其中{.........}部分就是你的Document对象,其余部分为固定格式。使用这种格式正好弥补了Document Type为JSON这种方式只能一条一条的导入,效率太低,当你需要批量导入多个Document时,采用这种格式支持批量导入多个Document。

 

    如果你需要导入XML数据,你需要选择Document Type为XML,如图:
跟益达学Solr5之批量索引JSON数据
            
    
    博客分类: Solr SolrJSONImport 
 <doc></doc>标签之间的就是你的XML数据,不过它跟Document Type选择为JSON有同样的弊端就是只支持单条导入,如果你需要批量导入XML数据,你同样可以选择Document Type为Solr Command(raw XML or JSON),只不过这时候,数据格式应该是类似这样的:

<add>
    <doc>
        <id>xxxx</id>
        <name>xxxxxxxx</name>
        <age>xxxxxxxx</age>
    </doc>
    
    <doc>
        <id>xxxx</id>
        <name>xxxxxxxx</name>
        <age>xxxxxxxx</age>
    </doc>

    <doc>
        <id>xxxx</id>
        <name>xxxxxxxx</name>
        <age>xxxxxxxx</age>
    </doc>
    
    ............ and so on
</add>

    如果你想更新Document,那就把<add>元素改成<update>即可,同理还有<delete>你懂的,先前在讲post.jar的时候我有提到过,具体请参阅《跟益达学Solr5之玩转post.jar》,OK,说了那么多,那现在我就以JSON数据为例进行一个操作示范,假定我有这样一个JSON数据,如图:
跟益达学Solr5之批量索引JSON数据
            
    
    博客分类: Solr SolrJSONImport 
     首先我们需要从JSON数据中提炼出Field域,并在我们的Schema.xml配置文件定义域,如图:
跟益达学Solr5之批量索引JSON数据
            
    
    博客分类: Solr SolrJSONImport 
   然后我们需要把传统的JSON数据转换成Solr能识别的格式,如图:
跟益达学Solr5之批量索引JSON数据
            
    
    博客分类: Solr SolrJSONImport 

{
	"add": {
		"doc": {"id":"1", "name":"Red Lobster", "city":"San Francisco, CA", "type":"Sit-down Chain", "state":"California", "tags":["sea food", "sit down"], "price":33.00}
	},
	"add": {
		"doc": {"id":"2", "name":"Red Lobster", "city":"Atlanta, GA", "type":"Sit-down Chain", "state":"Georgia", "tags":["sea food", "sit-down"], "price":22.00}
	},
	"add": {
		"doc": {"id":"3", "name":"Red Lobster", "city":"New York, NY", "type":"Sit-down Chain", "state":"New York", "tags":["sea food", "sit-down"], "price":29.00}
	},
	"add": {
		"doc": {"id":"4", "name":"McDonalds", "city":"San Francisco, CA", "type":"Fast Food", "state":"California", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":9.00}
	},
	"add": {
		"doc": {"id":"5", "name":"McDonalds", "city":"Atlanta, GA", "type":"Fast Food", "state":"Georgia", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00}
	},
	"add": {
		"doc": {"id":"6", "name":"McDonalds", "city":"New York, NY", "type":"Fast Food", "state":"New York", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00}
	},
	"add": {
		"doc": {"id":"7", "name":"McDonalds", "city":"Chicago, IL", "type":"Fast Food", "state":"Illinois", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00}
	},
	"add": {
		"doc": {"id":"8", "name":"McDonalds", "city":"Austin, TX", "type":"Fast Food", "state":"Texas", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00}
	},
	"add": {
		"doc": {"id":"9", "name":"Pizza Hut", "city":"Atlanta, GA", "type":"Sit-down Chain", "state":"Georgia", "tags":["pizza", "sit-down", "delivery"], "price":15.00}
	},
	"add": {
		"doc": {"id":"10", "name":"Pizza Hut", "city":"New York, NY", "type":"Sit-down Chain", "state":"New York", "tags":["pizza", "sit-down", "delivery"], "price":24.00}
	},
	"add": {
		"doc": {"id":"11", "name":"Pizza Hut", "city":"Austin, TX", "type":"Sit-down Chain", "state":"Texas", "tags":["pizza", "sit-down", "delivery"], "price":18.00}
	},
	"add": {
		"doc": {"id":"12", "name":"Freddy's Pizza Shop", "city":"Los Angeles, CA", "type":"Local Sit-down", "state":"California", "tags":["pizza", "pasta", "sit-down"], "price":25.00}
	},
	"add": {
		"doc": {"id":"13", "name":"The Iberian Pig", "city":"Atlanta, GA", "type":"Upscale", "state":"Georgia", "tags":["spanish", "tapas", "sit-down", "upscale"], "price":45.00}
	},
	"add": {
		"doc": {"id":"14", "name":"Sprig", "city":"Atlanta, GA", "type":"Local Sit-down", "state":"Georgia", "tags":["sit-down", "gluten-free", "southern cuisine"], "price":15.00}
	},
	"add": {
		"doc": {"id":"15", "name":"Starbucks", "city":"San Francisco, CA", "type":"Coffee Shop", "state":"California", "tags":["coffee", "breakfast"], "price":7.50}
	},
	"add": {
		"doc": {"id":"16", "name":"Starbucks", "city":"Atlanta, GA", "type":"Coffee Shop", "state":"Georgia", "tags":["coffee", "breakfast"], "price":4.00}
	},
	"add": {
		"doc": {"id":"17", "name":"Starbucks", "city":"New York, NY", "type":"Coffee Shop", "state":"New York", "tags":["coffee", "breakfast"], "price":6.50}
	},
	"add": {
		"doc": {"id":"18", "name":"Starbucks", "city":"Chicago, IL", "type":"Coffee Shop", "state":"Illinois", "tags":["coffee", "breakfast"], "price":6.00}
	},
	"add": {
		"doc": {"id":"19", "name":"Starbucks", "city":"Austin, TX", "type":"Coffee Shop", "state":"Texas", "tags":["coffee", "breakfast"], "price":5.00}
	},
	"add": {
		"doc": {"id":"20", "name":"Starbucks", "city":"Greenville, SC", "type":"Coffee Shop", "state":"South Carolina", "tags":["coffee", "breakfast"], "price":3.00}
	}
}

    然后启动你的Tomcat,然后如图操作:
跟益达学Solr5之批量索引JSON数据
            
    
    博客分类: Solr SolrJSONImport 
 

    提交后,执行查询,如图:
跟益达学Solr5之批量索引JSON数据
            
    
    博客分类: Solr SolrJSONImport 
 as

   请注意Document Type选择项,如果你选择为JSON,那你将会收到这样一个异常,如图: 
跟益达学Solr5之批量索引JSON数据
            
    
    博客分类: Solr SolrJSONImport 
    示例相关的配置以及测试数据,请看底下的附件,如果你们在操作过程中,遇到任何问题,请联系我,同时也欢迎各路Java高手加群一起交流学习,

   益达Q-Q:                7-3-6-0-3-1-3-0-5

 

   益达的Q-Q群:      1-0-5-0-9-8-8-0-6

 

 

   

 

 

   

  • 跟益达学Solr5之批量索引JSON数据
            
    
    博客分类: Solr SolrJSONImport 
  • 大小: 45.9 KB
  • 跟益达学Solr5之批量索引JSON数据
            
    
    博客分类: Solr SolrJSONImport 
  • 大小: 38.6 KB
  • 跟益达学Solr5之批量索引JSON数据
            
    
    博客分类: Solr SolrJSONImport 
  • 大小: 46.6 KB
  • 跟益达学Solr5之批量索引JSON数据
            
    
    博客分类: Solr SolrJSONImport 
  • 大小: 82.9 KB
  • 跟益达学Solr5之批量索引JSON数据
            
    
    博客分类: Solr SolrJSONImport 
  • 大小: 21.1 KB
  • 跟益达学Solr5之批量索引JSON数据
            
    
    博客分类: Solr SolrJSONImport 
  • 大小: 39.7 KB
  • 跟益达学Solr5之批量索引JSON数据
            
    
    博客分类: Solr SolrJSONImport 
  • 大小: 62.5 KB
  • 跟益达学Solr5之批量索引JSON数据
            
    
    博客分类: Solr SolrJSONImport 
  • 大小: 55.4 KB
  • 跟益达学Solr5之批量索引JSON数据
            
    
    博客分类: Solr SolrJSONImport 
  • 大小: 68.9 KB
相关标签: Solr JSON Import