创建Collection

18-02-02 来源：[db:作者]

收藏我要投稿

创建Collection

1.生产实体配置文件：

solrctl instancedir –generate $HOME/collection2

生成配置文件后会在collection2 /conf这个目录下产生很多配置文件，我们可以根据自己的需要修改schema.xml文件，具体schema.xml的修改规则。

2.创建 collection1 实例并将配置文件上传到 zookeeper：

solrctl instancedir –create collection2 HOME/collection2如果已经存在采用–update进行更新solrctlinstancedir–updatecollection2" role="presentation"> $H O M E / c o l l e c t i o n 2 如果已经存在采用 - u p d a t e 进行更新 s o l r c t l i n s t a n c e d i r - u p d a t e c o l l e c t i o n 2$ HOME/collection2

可以通过下面命令查看上传的实体：

solrctl instancedir –list

3.上传到 zookeeper 之后，其他节点就可以从上面下载配置文件。接下来创建 collection:

solrctl collection –create collection2 -s 2 -r 1

其中-s表示设置Shard数为2，-r表示设置的replica数为1。

安装以上步骤solr的实例就算创建完毕

4.修改Collection

当我们创建Collection完成后，如果需要修改schema.xml文件重新配置需要索引的字段可以按如下操作：

1.如果是修改原有schema.xml中字段值，而在solr中已经插入了索引数据，那么我们需要清空索引数据集，清空数据集可以通过solr API来完成。

2.如果是在原有schema.xml中加入新的索引字段，那么可以跳过1，直接执行：

solrctl instancedir –update solrtest $HOME/collection2

solrctl collection –reload collection2

5.清空索引

选中shard 后选择Documents

RequestHandler 默认是
/update

Document Type 选择 xml

Document(s) 输入

执行 Submit Document

shard 卸载重载

solrctl –solr http://xhadoop3:8983/solr core –unload solrtest_shard2_replica1
solrctl –solr http://sit-hadoop3:8983/solr core –reload hotel_shard1_replica1

========================flume 配置文件开始 =====================

# Please paste flume.conf here. Example:
# Sources, channels, and sinks are defined per agent name, in this case 'kafka2solr'.
# 配置 source  channel sink 的名字
kafka2solr.sources = source_from_kafka
kafka2solr.channels = mem_channel
kafka2solr.sinks = solrSink
#kafka2solr.sinks    = sink1 

# 配置Source类别为kafka
kafka2solr.sources.source_from_kafka.type = org.apache.flume.source.kafka.KafkaSource
kafka2solr.sources.source_from_kafka.channels = mem_channel
kafka2solr.sources.source_from_kafka.batchSize = 100
kafka2solr.sources.source_from_kafka.kafka.bootstrap.servers= 172.16.6.11:9092,172.16.6.12:9092,172.16.6.13:9092
kafka2solr.sources.source_from_kafka.kafka.topics = logaiStatus
kafka2solr.sources.source_from_kafka.kafka.consumer.group.id = flume_solr_caller
kafka2solr.sources.source_from_kafka.kafka.consumer.auto.offset.reset=latest

#配置channel type为memory,通常生产环境中设置为file或者直接用kafka作为channel
kafka2solr.channels.mem_channel.type = memory
kafka2solr.channels.mem_channel.keep-alive = 60


# Other config values specific to each type of channel(sink or source)  
# can be defined as well  
# In this case, it specifies the capacity of the memory channel  
kafka2solr.channels.mem_channel.capacity = 1000
kafka2solr.channels.mem_channel.transactionCapacity = 1000  

#kafka2solr.sinks.sink1.type         = logger  
#kafka2solr.sinks.sink1.channel      = mem_channel
# 配置sink到solr,并使用morphline转换数据
kafka2solr.sinks.solrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink
kafka2solr.sinks.solrSink.channel = mem_channel
#kafka2solr.sinks.solrSink.morphlineFile = /etc/flume-ng/conf/morphline.conf
kafka2solr.sinks.solrSink.morphlineFile = morphlines.conf
kafka2solr.sinks.solrSink.morphlineId=morphline1
kafka2solr.sinks.solrSink.isIgnoringRecoverableExceptions=true

=============================== flume 配置文件介绍==================

############ morphline conf 开始 #################

下面进行了正则抽取log 的各个字段

SOLR_COLLECTION : "collection2"
SOLR_COLLECTION : ${?ENV_S
OLR_COLLECTION}

SOLR_LOCATOR : {
  # Name of solr collection
  collection : ${SOLR_COLLECTION}

  # ZooKeeper ensemble 
  #CDH的专有写法,开源版本不支持。
  zkHost : "$ZK_HOST"
 }

morphlines : [
  {
    # Name used to identify a morphline. E.g. used if there are multiple
    # morphlines in a morphline config file
    id : morphline1

    # Import all morphline commands in these java packages and their
    # subpackages. Other commands that may be present on the classpath are 
    # not visible to this morphline.
    importCommands : ["org.kitesdk.**", "org.apache.solr.**"]

    commands : [

      {
        # Parse input attachment and emit a record for each input line                
        readLine {
          charset : UTF-8
        }
      }

      {
      # 生成UUID作为 id
        generateUUID {
          type : nonSecure
          field : id 
          preserveExisting : false
        }
      }
      #进行相关的调试 可以调低日志等级 或者采用 logWarn 替换 logInfo
      { logInfo { format : "output record with id: {}", args : ["@{}"] } }
      {
        grok {
          # Consume the output record of the previous command and pipe another record downstream.
          #
          # A grok-dictionary is a config file that contains prefabricated
          # regular expressions that can be referred to by name. grok patterns
          # specify such a regex name, plus an optional output field name.
          # The syntax is %{REGEX_NAME:OUTPUT_FIELD_NAME}
          # The input line is expected in the "message" input field.
          #2018-01-26 21:24:16,171 [INFO ] [sparkDriverActorSystem-akka.actor.default-dispatcher-3] - [akka.event.slf4j.Slf4jLogger$$anonfun$receive$1$$anonfun$applyOrElse$3.apply$mcV$sp(Slf4jLogger.scala:74)] Remote daemon shut down; proceeding with flushing remote transports.
          #dictionaryFiles : [src/test/resources/grok-dictionaries]
           expressions : {
            # message : """%{SYSLOGTIMESTAMP:timestamp} [%{LOGLEVEL:level}] [%{THREAD:thread}] - (?:\[%{POSINT:pid}\])?: %{GREEDYDATA:msg}"""
            message : """(?

\d{4}\-\d{2}\-\d{2} \d{2}\:\d{2}:\d{2}\,\d{3}) \[(?.+)\] \[(?.+)\] - \[(?.+)\] (?.*)""" } #'(? \d{4}\-\d{2}\-\d{2} \d{2}\:\d{2}:\d{2}\,\d{3}) (?\[[INFOWARE ]{4,5}\]) (?\[[A-Za-z\.\-0-9]+\]) (?.*)' } } { logWarn { format : "output record with grok: {}", args : ["@{}"] } } # Consume the output record of the previous command, transform it and pipe the record downstream. # # This command deletes record fields that are unknown to Solr # schema.xml. Recall that Solr throws an exception on any attempt to # load a document that contains a field that isn't specified in schema.xml. { sanitizeUnknownSolrFields { # Location from which to fetch Solr schema solrLocator : ${SOLR_LOCATOR} } } # log the record at INFO level to SLF4J { logInfo { format : "output record before solr: {}", args : ["@{}"] } } # load the record into a Solr server or MapReduce Reducer { loadSolr {solrLocator : ${SOLR_LOCATOR}} } ] } ]

############ morphline conf 结束 #################

CM 界面
这里写图片描述

点击复制链接与好友分享!回本站首页