高可用集群搭建
创建hadoop账户
-
创建hadoop账户(#注意,接下来的操作均在hadoop账户下运行)
# useradd hadoop# passwd hadoop su - hadoop mkdir soft disk1 disk2 mkdir -p disk{1,2}/dfs/{dn,nn} mkdir -p disk{1,2}/nodemgr/local
- 将本地目录下的hadoop-2.6.0-cdh5.5.0.tar.gz,上传到虚拟机的/home/hadoop/soft目录下,并且更改名字。
1 tar -xzvf hadoop-2.6.0-cdh5.502 mv hadoop-2.6.0-cdh5.50 hadoop
- 配置hadoop的环境变量
1 vim ~/.bashrc2 export HADOOP_HOME=/home/hadoop/soft/hadoop3 export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH4 source ~/.bashrc
- 进入/home/hadoop/soft/hadoop/etc/hadoop修改配置文件修改core-site.xml修改hdfs-site.xml修改mapred-site.xml修改yarn-site.xml修改环境变量(下面有文字版) 增加hadoop环境变量vim ~/.bashrc
- 格式化namenode
hdfs namenode -format
- 启动start-all.sh启动全部服务,或者可以分别启动(start-dfs.sh,start-yarn.sh)/(hadoop-daemon.sh)
- 启动之后可以通过jps查看当前开启的服务以及netstat -tnlp查询端口信息(或者可以在windows下通过登录xxx:50070和xxx:8088查看)
高可用的搭建
- 将所有的服务全部关掉。
- 开启zookeeper(配置安装请参考)
- 在kslave5,kslave6,kslave7上执行
cdmkdir disk1/dfs/jn
-
进入/home/hadoop/soft/hadoop/etc/hadoop修改配置文件
修改core-site.xml
1
2 3 6fs.defaultFS 4hdfs://kcluster 57 10ha.zookeeper.quorum 8kslave5:2181,kslave6:2181,kslave7:2181 9修改hdfs-site.xml
1
2 3 4 7dfs.replication 57 68 9 12dfs.namenode.name.dir 10file:///home/hadoop/disk1/dfs/nn,file:///home/hadoop/disk2/dfs/nn 1113 14 17dfs.datanode.data.dir 15file:///home/hadoop/disk1/dfs/dn,file:///home/hadoop/disk2/dfs/dn 1618 19 22dfs.tmp.dir 20/home/hadoop/disk1/tmp 2123 24 27dfs.nameservices 25kcluster 2628 29 32dfs.ha.namenodes.kcluster 30kma1,kma2 3133 34 37dfs.namenode.rpc-address.kcluster.kma1 35kmaster1:8020 3638 41dfs.namenode.rpc-address.kcluster.kma2 39kmaster2:8020 4042 43 46dfs.namenode.http-address.kcluster.kma1 44kmaster1:50070 4547 50dfs.namenode.http-address.kcluster.kma2 48kmaster2:50070 4951 52 55dfs.namenode.shared.edits.dir 53qjournal://kslave5:8485;kslave6:8485;kslave7:8485/kcluster 5456 57 60dfs.journalnode.edits.dir 58/home/hadoop/disk1/dfs/jn 5961 62 65dfs.client.failover.proxy.provider.kcluster 63org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider 6466 67 70dfs.ha.fencing.methods 68shell(/bin/true) 6971 72 75dfs.ha.automatic-failover.enabled 73true 74修改mapred-site.xml
1
2 3 4 7mapreduce.framework.name 5yarn 68 9 12mapreduce.jobhistory.address 10kmaster1:10020 1113 14 17mapreduce.jobhistory.webapps.address 15kmaster1:19888 1618 19 22yarn.app.mapreduce.am.staging-dir 20/user 21修改yarn-site.xml
yarn.resourcemanager.ha.enabled true yarn.resourcemanager.cluster-id yarn-ha yarn.resourcemanager.ha.rm-ids krm1,krm2 yarn.resourcemanager.hostname.krm1 kmaster1 yarn.resourcemanager.hostname.krm2 kmaster2 yarn.resourcemanager.recovery.enabled true yarn.resourcemanager.store.class org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore yarn.resourcemanager.zk-address kslave5:2181,kslave6:2181,kslave7:2181 yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.local-dirs file:///home/hadoop/disk1/nodemgr/local,file:///home/hadoop/disk2/nodemgr/local yarn.nodemanager.log-dirs file:///home/hadoop/disk1/log/hadoop-yarn/containers yarn.nodemanager.remote-app-log-dir file:///home/hadoop/disk1/log/hadoop-yarn/apps yarn.log-aggregation-enable true 修改hadoop-env.sh(默认情况下HADOOP_PID_DIR指定位置是/tmp目录,但是/tmp目录每次重启会做定时的更新,所以会不稳定,所以在完全分布式环境下需要进行修改)
1 export JAVA_HOME=/usr/java/default2 export HADOOP_PID_DIR=/home/hadoop/disk1/tmp
修改mapred-env.sh
1 export MAPRED_PID_DIR=/home/hadoop/disk1/tmp
修改yarn-env.sh
1 export YARN_PID_DIR=/home/hadoop/disk1/tmp
配置hadoop环境变量(将接下来集群需要的zookeeper和hbase的环境变量都加好了)
1 vim ~/.bashrc 2 export HADOOP_HOME=/home/hadoop/soft/hadoop 3 export HADOOP_PREFIX=$HADOOP_HOME 4 export HADOOP_COMMON_HOME=$HADOOP_PREFIX 5 export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop 6 export HADOOP_HDFS_HOME=$HADOOP_PREFIX 7 export HADOOP_MAPRED_HOME=$HADOOP_PREFIX 8 export HADOOP_YARN_HOME=$HADOOP_PREFIX 9 export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin10 export ZOOKEEPER_HOME=/home/hadoop/soft/zk11 export PATH=$PATH:$ZOOKEEPER_HOME/bin12 export HBASE_HOME=/home/hadoop/soft/hbase13 export PATH=$PATH:$HBASE_HOME/bin
- 开启journalnode(日志服务器)
1 hadoop-daemon.sh start journalnode
- 初始化原来的namenode:
1 # 初始化edits目录2 hdfs namenode -initializeSharedEdits3 # 重新启动namenode4 hadoop-daemon.sh start namenode5 # 查看namenode状态6 hdfs haadmin -getServiceState kma1
- 初始化现在的namenode:
1 hdfs namenode -bootstrapStandby2 # 启动第二台 namenode3hadoop-daemon.sh start namenode 4 # 查看状态 5 hdfs haadmin -getServiceState kma2
- 格式化zookeeper控制器,选择一台namenode格式操作:
1 hdfs zkfc -formatZK
再次查看namenode状态
hdfs haadmin -getServiceState kma1
active
hdfs haadmin -getServiceState kma2
standby
查看到一个namenode(active),另一个namenode(standby)。接下来,开启其余节点上的服务,均正常运行。至此,HA集群搭建完毕。