solrctl instancedir –generate $HOME/collection2
生成配置文件后会在collection2 /conf这个目录下产生很多配置文件,我们可以根据自己的需要修改schema.xml文件,具体schema.xml的修改规则。
solrctl instancedir –create collection2
可以通过下面命令查看上传的实体:
solrctl instancedir –list
solrctl collection –create collection2 -s 2 -r 1
其中-s表示设置Shard数为2,-r表示设置的replica数为1。
安装以上步骤solr的实例就算创建完毕
当我们创建Collection完成后,如果需要修改schema.xml文件重新配置需要索引的字段可以按如下操作:
1.如果是修改原有schema.xml中字段值,而在solr中已经插入了索引数据,那么我们需要清空索引数据集,清空数据集可以通过solr API来完成。
2.如果是在原有schema.xml中加入新的索引字段,那么可以跳过1,直接执行:
选中shard 后选择Documents
RequestHandler 默认是
/update
Document Type 选择 xml
Document(s) 输入
1
执行 Submit Document
solrctl –solr http://xhadoop3:8983/solr core –unload solrtest_shard2_replica1
solrctl –solr http://sit-hadoop3:8983/solr core –reload hotel_shard1_replica1
========================flume 配置文件 开始 =====================
# Please paste flume.conf here. Example: # Sources, channels, and sinks are defined per agent name, in this case 'kafka2solr'. # 配置 source channel sink 的名字 kafka2solr.sources = source_from_kafka kafka2solr.channels = mem_channel kafka2solr.sinks = solrSink #kafka2solr.sinks = sink1 # 配置Source类别为kafka kafka2solr.sources.source_from_kafka.type = org.apache.flume.source.kafka.KafkaSource kafka2solr.sources.source_from_kafka.channels = mem_channel kafka2solr.sources.source_from_kafka.batchSize = 100 kafka2solr.sources.source_from_kafka.kafka.bootstrap.servers= 172.16.6.11:9092,172.16.6.12:9092,172.16.6.13:9092 kafka2solr.sources.source_from_kafka.kafka.topics = logaiStatus kafka2solr.sources.source_from_kafka.kafka.consumer.group.id = flume_solr_caller kafka2solr.sources.source_from_kafka.kafka.consumer.auto.offset.reset=latest #配置channel type为memory,通常生产环境中设置为file或者直接用kafka作为channel kafka2solr.channels.mem_channel.type = memory kafka2solr.channels.mem_channel.keep-alive = 60 # Other config values specific to each type of channel(sink or source) # can be defined as well # In this case, it specifies the capacity of the memory channel kafka2solr.channels.mem_channel.capacity = 1000 kafka2solr.channels.mem_channel.transactionCapacity = 1000 #kafka2solr.sinks.sink1.type = logger #kafka2solr.sinks.sink1.channel = mem_channel # 配置sink到solr,并使用morphline转换数据 kafka2solr.sinks.solrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink kafka2solr.sinks.solrSink.channel = mem_channel #kafka2solr.sinks.solrSink.morphlineFile = /etc/flume-ng/conf/morphline.conf kafka2solr.sinks.solrSink.morphlineFile = morphlines.conf kafka2solr.sinks.solrSink.morphlineId=morphline1 kafka2solr.sinks.solrSink.isIgnoringRecoverableExceptions=true
=============================== flume 配置文件 介绍==================
下面进行了正则抽取log 的各个字段
SOLR_COLLECTION : "collection2" SOLR_COLLECTION : ${?ENV_S OLR_COLLECTION} SOLR_LOCATOR : { # Name of solr collection collection : ${SOLR_COLLECTION} # ZooKeeper ensemble #CDH的专有写法,开源版本不支持。 zkHost : "$ZK_HOST" } morphlines : [ { # Name used to identify a morphline. E.g. used if there are multiple # morphlines in a morphline config file id : morphline1 # Import all morphline commands in these java packages and their # subpackages. Other commands that may be present on the classpath are # not visible to this morphline. importCommands : ["org.kitesdk.**", "org.apache.solr.**"] commands : [ { # Parse input attachment and emit a record for each input line readLine { charset : UTF-8 } } { # 生成UUID作为 id generateUUID { type : nonSecure field : id preserveExisting : false } } #进行相关的调试 可以调低日志等级 或者采用 logWarn 替换 logInfo { logInfo { format : "output record with id: {}", args : ["@{}"] } } { grok { # Consume the output record of the previous command and pipe another record downstream. # # A grok-dictionary is a config file that contains prefabricated # regular expressions that can be referred to by name. grok patterns # specify such a regex name, plus an optional output field name. # The syntax is %{REGEX_NAME:OUTPUT_FIELD_NAME} # The input line is expected in the "message" input field. #2018-01-26 21:24:16,171 [INFO ] [sparkDriverActorSystem-akka.actor.default-dispatcher-3] - [akka.event.slf4j.Slf4jLogger$$anonfun$receive$1$$anonfun$applyOrElse$3.apply$mcV$sp(Slf4jLogger.scala:74)] Remote daemon shut down; proceeding with flushing remote transports. #dictionaryFiles : [src/test/resources/grok-dictionaries] expressions : { # message : """%{SYSLOGTIMESTAMP:timestamp} [%{LOGLEVEL:level}] [%{THREAD:thread}] - (?:\[%{POSINT:pid}\])?: %{GREEDYDATA:msg}""" message : """(?
CM 界面