频道栏目
首页 > 资讯 > 云计算 > 正文

创建Collection

18-02-02        来源:[db:作者]  
收藏   我要投稿

创建Collection

1.生产实体配置文件:

solrctl instancedir –generate $HOME/collection2

生成配置文件后会在collection2 /conf这个目录下产生很多配置文件,我们可以根据自己的需要修改schema.xml文件,具体schema.xml的修改规则。

2.创建 collection1 实例并将配置文件上传到 zookeeper:

solrctl instancedir –create collection2 HOME/collection2updatesolrctlinstancedirupdatecollection2" role="presentation">HOME/collection2updatesolrctlinstancedirupdatecollection2HOME/collection2

可以通过下面命令查看上传的实体:

solrctl instancedir –list

3.上传到 zookeeper 之后,其他节点就可以从上面下载配置文件。接下来创建 collection:

solrctl collection –create collection2 -s 2 -r 1

其中-s表示设置Shard数为2,-r表示设置的replica数为1。

安装以上步骤solr的实例就算创建完毕

4.修改Collection

当我们创建Collection完成后,如果需要修改schema.xml文件重新配置需要索引的字段可以按如下操作:

1.如果是修改原有schema.xml中字段值,而在solr中已经插入了索引数据,那么我们需要清空索引数据集,清空数据集可以通过solr API来完成。

2.如果是在原有schema.xml中加入新的索引字段,那么可以跳过1,直接执行:

solrctl instancedir –update solrtest $HOME/collection2

solrctl collection –reload collection2

5.清空索引

选中shard 后选择Documents

RequestHandler 默认是
/update

Document Type 选择 xml

Document(s) 输入

1

执行 Submit Document

shard 卸载重载

solrctl –solr http://xhadoop3:8983/solr core –unload solrtest_shard2_replica1
solrctl –solr http://sit-hadoop3:8983/solr core –reload hotel_shard1_replica1

========================flume 配置文件 开始 =====================

# Please paste flume.conf here. Example:
# Sources, channels, and sinks are defined per agent name, in this case 'kafka2solr'.
# 配置 source  channel sink 的名字
kafka2solr.sources = source_from_kafka
kafka2solr.channels = mem_channel
kafka2solr.sinks = solrSink
#kafka2solr.sinks    = sink1 

# 配置Source类别为kafka
kafka2solr.sources.source_from_kafka.type = org.apache.flume.source.kafka.KafkaSource
kafka2solr.sources.source_from_kafka.channels = mem_channel
kafka2solr.sources.source_from_kafka.batchSize = 100
kafka2solr.sources.source_from_kafka.kafka.bootstrap.servers= 172.16.6.11:9092,172.16.6.12:9092,172.16.6.13:9092
kafka2solr.sources.source_from_kafka.kafka.topics = logaiStatus
kafka2solr.sources.source_from_kafka.kafka.consumer.group.id = flume_solr_caller
kafka2solr.sources.source_from_kafka.kafka.consumer.auto.offset.reset=latest

#配置channel type为memory,通常生产环境中设置为file或者直接用kafka作为channel
kafka2solr.channels.mem_channel.type = memory
kafka2solr.channels.mem_channel.keep-alive = 60


# Other config values specific to each type of channel(sink or source)  
# can be defined as well  
# In this case, it specifies the capacity of the memory channel  
kafka2solr.channels.mem_channel.capacity = 1000
kafka2solr.channels.mem_channel.transactionCapacity = 1000  

#kafka2solr.sinks.sink1.type         = logger  
#kafka2solr.sinks.sink1.channel      = mem_channel
# 配置sink到solr,并使用morphline转换数据
kafka2solr.sinks.solrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink
kafka2solr.sinks.solrSink.channel = mem_channel
#kafka2solr.sinks.solrSink.morphlineFile = /etc/flume-ng/conf/morphline.conf
kafka2solr.sinks.solrSink.morphlineFile = morphlines.conf
kafka2solr.sinks.solrSink.morphlineId=morphline1
kafka2solr.sinks.solrSink.isIgnoringRecoverableExceptions=true

=============================== flume 配置文件 介绍==================

############ morphline conf 开始 #################

下面进行了正则抽取log 的各个字段

SOLR_COLLECTION : "collection2"
SOLR_COLLECTION : ${?ENV_S
OLR_COLLECTION}

SOLR_LOCATOR : {
  # Name of solr collection
  collection : ${SOLR_COLLECTION}

  # ZooKeeper ensemble 
  #CDH的专有写法,开源版本不支持。
  zkHost : "$ZK_HOST"
 }

morphlines : [
  {
    # Name used to identify a morphline. E.g. used if there are multiple
    # morphlines in a morphline config file
    id : morphline1

    # Import all morphline commands in these java packages and their
    # subpackages. Other commands that may be present on the classpath are 
    # not visible to this morphline.
    importCommands : ["org.kitesdk.**", "org.apache.solr.**"]

    commands : [

      {
        # Parse input attachment and emit a record for each input line                
        readLine {
          charset : UTF-8
        }
      }

      {
      # 生成UUID作为 id
        generateUUID {
          type : nonSecure
          field : id 
          preserveExisting : false
        }
      }
      #进行相关的调试 可以调低日志等级 或者采用 logWarn 替换 logInfo
      { logInfo { format : "output record with id: {}", args : ["@{}"] } }
      {
        grok {
          # Consume the output record of the previous command and pipe another record downstream.
          #
          # A grok-dictionary is a config file that contains prefabricated
          # regular expressions that can be referred to by name. grok patterns
          # specify such a regex name, plus an optional output field name.
          # The syntax is %{REGEX_NAME:OUTPUT_FIELD_NAME}
          # The input line is expected in the "message" input field.
          #2018-01-26 21:24:16,171 [INFO ] [sparkDriverActorSystem-akka.actor.default-dispatcher-3] - [akka.event.slf4j.Slf4jLogger$$anonfun$receive$1$$anonfun$applyOrElse$3.apply$mcV$sp(Slf4jLogger.scala:74)] Remote daemon shut down; proceeding with flushing remote transports.
          #dictionaryFiles : [src/test/resources/grok-dictionaries]
           expressions : {
            # message : """%{SYSLOGTIMESTAMP:timestamp} [%{LOGLEVEL:level}] [%{THREAD:thread}] - (?:\[%{POSINT:pid}\])?: %{GREEDYDATA:msg}"""
            message : """(?
############ morphline conf 结束 #################

CM 界面
这里写图片描述

相关TAG标签
上一篇:大数据测试之ETL
下一篇:openstack中虚拟机启动后主机名设置问题讲解
相关文章
图文推荐

关于我们 | 联系我们 | 广告服务 | 投资合作 | 版权申明 | 在线帮助 | 网站地图 | 作品发布 | Vip技术培训 | 举报中心

版权所有: 红黑联盟--致力于做实用的IT技术学习网站