频道栏目
首页 > 网络 > 云计算 > 正文

Hadoop2.7.2集群搭建

2016-09-05 09:52:21           
收藏   我要投稿

1 安装前的准备

VPS服务器三台(三台2台内存6G硬盘150G,一台4G硬盘30G) VPS安装的系统(CentOS 6.0) Hadoop2.7.2

2 准备系统环境

1 安装JDK

 

这里写图片描述

 

linux下载方法:wget –no-check-certificate –no-cookies –header “Cookie: oraclelicense=accept-securebackup-cookie” http://download.oracle.com/otn-pub/java/jdk/7u71-b14/jdk-7u71-linux-x64.rpm

2 下载Hadoop2.7.2

linux下载方法

wget http://apache.fayea.com/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz

我下载到的目录

 

这里写图片描述

 

三台VPS服务器的Hadoop都是放在这个目录下面

3 配置Hadoop的配置文件

1 Hadoop解压后的目录结构

 

这里写图片描述

 

配置文件放在etc/hadoop里面

 

这里写图片描述

 

我们从中选取几个进行配置并不是所有的文件都需要配置(需要配置的如下图)

 

这里写图片描述

 

对于集群部署我这边就配置了这6个文件,下面来一个一个看

core-site.xml文件

fs.defaultFS

hdfs://(填写自己的NameNode服务器的IP):9000

io.file.buffer.size

131072

基础的配置就配置这两个就好了(Hadoop官网给出来就这两个对于core-site.xml更多配置信息可以参照http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml自己也是初学Hadoop)配置文件和单机版Hadoop这个基本上一样。

hdfs-site.xml 文件

dfs.replication

1

dfs.namenode.datanode.registration.ip-hostname-check

false

mapred-site.xml文件

mapreduce.framework.name

yarn

mapreduce.map.memory.mb

1536

mapreduce.map.java.opts

-Xmx1024M

mapreduce.reduce.memory.mb

3072

mapreduce.reduce.java.opts

-Xmx2560M

mapreduce.task.io.sort.mb

512

mapreduce.task.io.sort.factor

100

mapreduce.reduce.shuffle.parallelcopies

50

yarn.app.mapreduce.am.resource.mb

1024

yarn.app.mapreduce.am.command-opts

-Xmx768m

mapreduce.jobtracker.address

(JobTracker的IP地址这个必填,填写yarn所在的IP也就是NameNode的IP)

yarn-site.xml

yarn.nodemanager.aux-services

mapreduce_shuffle

yarn.resourcemanager.hostname

(yarn所在的IP也就是NameNode的IP)

yarn.resourcemanager.address

${yarn.resourcemanager.hostname}:8032

yarn.resourcemanager.scheduler.address

${yarn.resourcemanager.hostname}:8030

yarn.resourcemanager.resource-tracker.address

${yarn.resourcemanager.hostname}:8031

yarn.nodemanager.resource.memory-mb

5120

yarn.scheduler.minimum-allocation-mb

1024

yarn.nodemanager.vmem-pmem-ratio

2.1

hadoop-env.sh

# Licensed to the Apache Software Foundation (ASF) under one

# or more contributor license agreements. See the NOTICE file

# distributed with this work for additional information

# regarding copyright ownership. The ASF licenses this file

# to you under the Apache License, Version 2.0 (the

# "License"); you may not use this file except in compliance

# with the License. You may obtain a copy of the License at

#

# http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

# Set Hadoop-specific environment variables here.

# The only required environment variable is JAVA_HOME. All others are

# optional. When running a distributed configuration it is best to

# set JAVA_HOME in this file, so that it is correctly defined on

# remote nodes.

# The java implementation to use.

#export JAVA_HOME=${JAVA_HOME}

#填写自己的JAVA_HOME路径 直接用${JAVA_HOME}我的直接抛错,如果可以就不用换

export JAVA_HOME="/usr/java/jdk1.7.0_71"

# The jsvc implementation to use. Jsvc is required to run secure datanodes

# that bind to privileged ports to provide authentication of data transfer

# protocol. Jsvc is not required if SASL is configured for authentication of

# data transfer protocol using non-privileged ports.

#export JSVC_HOME=${JSVC_HOME}

export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}

# Extra Java CLASSPATH elements. Automatically insert capacity-scheduler.

for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do

if [ "$HADOOP_CLASSPATH" ]; then

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f

else

export HADOOP_CLASSPATH=$f

fi

done

# The maximum amount of heap to use, in MB. Default is 1000.

#export HADOOP_HEAPSIZE=

#export HADOOP_NAMENODE_INIT_HEAPSIZE=""

# Extra Java runtime options. Empty by default.

export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"

# Command specific options appended to HADOOP_OPTS when specified

export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"

export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"

export HADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS"

export HADOOP_NFS3_OPTS="$HADOOP_NFS3_OPTS"

export HADOOP_PORTMAP_OPTS="-Xmx512m $HADOOP_PORTMAP_OPTS"

# The following applies to multiple commands (fs, dfs, fsck, distcp etc)

export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"

#HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData $HADOOP_JAVA_PLATFORM_OPTS"

# On secure datanodes, user to run the datanode as after dropping privileges.

# This **MUST** be uncommented to enable secure HDFS if using privileged ports

# to provide authentication of data transfer protocol. This **MUST NOT** be

# defined if SASL is configured for authentication of data transfer protocol

# using non-privileged ports.

export HADOOP_SECURE_DN_USER=${HADOOP_SECURE_DN_USER}

# Where log files are stored. $HADOOP_HOME/logs by default.

#export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER

# Where log files are stored in the secure data environment.

export HADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER}

###

# HDFS Mover specific parameters

###

# Specify the JVM options to be used when starting the HDFS Mover.

# These options will be appended to the options specified as HADOOP_OPTS

# and therefore may override any similar flags set in HADOOP_OPTS

#

# export HADOOP_MOVER_OPTS=""

###

# Advanced Users Only!

###

# The directory where pid files are stored. /tmp by default.

# NOTE: this should be set to a directory that can only be written to by

# the user that will run the hadoop daemons. Otherwise there is the

# potential for a symlink attack.

export HADOOP_PID_DIR=${HADOOP_PID_DIR}

export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}

# A string representing this instance of hadoop. $USER by default.

export HADOOP_IDENT_STRING=$USER

#我的SSH端口改在26120 三台VPS的端口都是26120

export HADOOP_SSH_OPTS="-p 26120"

需要修改的:export JAVA_HOME=”/usr/java/jdk1.7.0_71”

需要添加的:export HADOOP_SSH_OPTS=”-p 26120”(如果SSH的端口不是22的默认端口就要添加这个)

slaves文件

168.168.168.2

168.168.168.3

例如你的三台机器IP为

168.168.168.1(Master NameNode)

168.168.168.2(Slaver DataNode)

168.168.168.3(Slaver DataNode)

这里面放的就是DataNode

如果你把168.168.168.1也写在里面就是该服务器不但是NameNode同时也是DataNode

——–时间来不及明天再进行补充完成剩下的2016年9月4日23:01:19

上一篇:机器学习之神经网络算法入门
下一篇:Spark2.0特征提取、转换、选择之一:数据规范化,String-Index、离散-连续特征相互转换
相关文章
图文推荐

关于我们 | 联系我们 | 广告服务 | 投资合作 | 版权申明 | 在线帮助 | 网站地图 | 作品发布 | Vip技术培训 | 举报中心

版权所有: 红黑联盟--致力于做实用的IT技术学习网站