HDFS 高可用(HA)集群搭建记录

什么是 HA ?
High Available : 高可用
虽然 HDFS 存在多个副本,但 NameNode 可能会出现单节点故障。对于只有一个 NameNode 节点的集群,一旦该节点出现故障,集群将无法使用直至重新启动。
通过开启 HDFS 的 HA 功能,通过在不同节点上设置 Active/Standby 多个 NameNode,当 Active NameNode 出现故障时,可以很快的将 Standby NameNode 切换至 Active 状态。只有 Active NameNode 才能对外提供读写服务。

环境:

  • CentOS 7.6.1810 Minimal
  • NAT 网络模式(虚拟机)
  • JDK 1.8
  • Hadoop 3.2.0
  • Zookeeper 3.4.13

集群规划(3 台):

主机名 NameNode DataNode ResourceManager NodeManager Zookeeper JournalNode ZKFC
master
master2
slave1

无关紧要的配置

  • CentOS 7 安装完后感觉分辨率太高,小屏幕顶不住,修改分辨率
    vi /boot/grub2/grub.cfg(CentOS 7)
    调整为 800x600x32 的分辨率,在 linux16 /vmlinuz-x.xx.x 行,末尾添加:vga=0x340,重启生效。修改为 vga=ask 将在启动时提示选择显示模式。
    勿改 linux16 /vmlinuz-0-rescue

    可用显示模式:

系统基础配置

由于是最小化安装(Minimal),部分用得到的命令可能需要手动安装。

关闭防火墙

  • 查看防火墙状态:
    Firewall:service firewalld status or firewall-cmd --state
    Iptables:service iptables status
  • 禁用防火墙开机自启
    CentOS 6,关闭 Iptables:chkconfig iptables off
    CentOS 7,关闭 Firewall:systemctl firewalld disable

网络

分配固定 IP。

主机名 IP
master 192.168.222.128
master2 192.168.222.129
slave1 192.168.222.130
  1. 启用网卡
    系统安装完后网卡默认不自启。
    • 查看设备:ip addr
    • 启用:ifup ens33
      默认为 dhcp 自动获取 IP。
  2. 安装 net-tools
    基本网络实用程序集合。含常用的 ifconfignetstat 命令等,方便查看和配置网络。
    yum install -y net-tools
  3. 设置网卡自动启动,分配固定 IP
    直接修改配置文件:
    vi /etc/sysconfig/network-scripts/ifcfg-ens33

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    BOOTPROTO=static # 使用什么协议,static(静态)
    ...
    ONBOOT=yes # 开机自启

    # IP 和网关等信息
    IPADDR=192.168.222.128
    NETMASK=255.255.255.0
    GATEWAY=192.168.222.2
    DNS1=8.8.8.8
    DNS2=4.4.4.4

    图中的 BOOTPROTO 应改为 static,不然手动配置的固定 IP 可能不生效。
    master 的网络配置

主机名和域名映射

  • 修改主机名
    三台分别设置为 master master2 slave1
    如:hostnamectl set-hostname master(使用此命令修改无需重启就可永久生效)
  • 修改 hosts 文件
    这里遇到过一个问题,已解决:Zookeeper 启动了,查看状态却提示可能未启动
    vi /etc/hosts

    1
    2
    3
    4
    5
    127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
    192.168.222.128 master
    192.168.222.129 master2
    192.168.222.130 slave1

SSH 免密验证

所有节点分别执行:

  • 生成密钥对
    ssh-keygen -t rsa
  • 拷贝公钥
    ssh-copy-id master
    ssh-copy-id master2
    ssh-copy-id slave1

* Hadoop 用户

* 将操作 Hadoop 的用户独立出来,可能会更易于管理和维护,但可能会引出一系列权限问题。可选。(root 用户下操作,3 台都要创建)

  • 创建 hadoop 用户组
    groupadd hadoop
  • 创建 hadoop 用户,并加入 hadoop 用户组
    useradd -g hadoop hadoop
  • 修改 hadoop 用户的密码
    passwd hadoop
  • /opt 目录所属组改为 hadoop 组
    chgrp /opt hadoop
    * 如需更改其下子目录和文件需加上 -r 参数
  • /opt 目录的所属组赋予写权限
    chmod g+w /opt
    * 如需更改其下子目录和文件需加上 -r 参数

安装 JDK

  • JDK 最好使用 1.8 版本,之后的版本有变动,需额外设置才能成功启动 HDFS

从主机复制压缩包至虚拟机

用 SecureCRT 的 SFTP 上传,Alt + P 呼出,拖入文件开始发送。

解压压缩包

把 JDK 文件解压至 /opt 下:
tar -xzvf ~/jdk-8u201-linux-x64.tar.gz -C /opt

配置 $PATH

  • /etc/profile 中添加:

    1
    2
    3
    4
    # Java
    export JAVA_HOME=/opt/jdk1.8.0_201
    export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
    export PATH=$PATH:$JAVA_HOME/bin
  • 重新登入或使用 source 让改动生效
    source /etc/profile

验证 JDK 是否成功安装

分发文件

  • /opt
    master2:scp -r /opt master2:/
    slave1:scp -r /opt slave1:/
  • /etc/profile 和 /etc/hosts
    scp /etc/profile master2:/etc
    scp /etc/hosts master2:/etc
    同样发送到 slave1 上。

下面的步骤将使用 hadoop 用户进行操作
新开一个连接,登入 hadoop 用户:

为 3 个节点的 hadoop 用户做好 SSH 免密验证。


上传 Hadoop,Zookeeper 压缩包

也可使用 wget 直接在虚拟机中下载:

  • Hadoop 2.9.2(清华源):wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.9.2/hadoop-2.9.2.tar.gz
  • Zookeeper 3.4.14:清华源

上传文件至虚拟机

SFTP 上传:

  • 修改 hadoop 用户的配置文件 /home/hadoop/.bashrc(或直接在 /etc/profile 中一并设置),添加内容:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    # -- HADOOP ENVIRONMENT VARIABLES START -- #
    ## Hadoop -v3.2.0
    export HADOOP_HOME=/opt/hadoop-3.2.0
    export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

    # Zookeeper -v3.4.13
    export ZK_HOME=/opt/zookeeper-3.4.13
    export PATH=$PATH:$ZK_HOME/bin
    # -- HADOOP ENVIRONMENT VARIABLES FINISH -- #
  • 重新登入或 source ~/.bashrc 生效

配置 Hadoop

Hadoop 的 6 个配置文件($HADOOP_HOME/etc/hadoop):

组件 配置文件
HDFS hadoop-env.sh, core-site.xml, hdfs-site.xml, workers
MapReduce mapred-site.xml
Yarn yarn-site.xml

Zookeeper 的配置文件:$ZK_HOME/conf/zoo.cfg

参考自:

切换至配置文件目录:cd $HADOOP_HOME/etc/hadoop

HDFS

  1. hadoop-env.sh
    取消注释(删掉前面的 #) JAVA_HOME 和 HADOOP_HOME,填入相应路径
    至少要指定 JAVA_HOME
    看过 2.9.2 版本的 hadoop-env.sh 文件,默认填了 ${JAVA_HOME},就这样不改,配置完其他文件后试了下没问题。

    1
    2
    # The java implementation to use.
    export JAVA_HOME=${JAVA_HOME}

    下面是 hadoop-3.2.0 的 hadoop-env.sh 文件,多了个 HADOOP_HOME,顺手填上。

    1
    2
    3
    4
    5
    6
    7
    # The java implementation to use. By default, this environment
    # variable is REQUIRED on ALL platforms except OS X!
    export JAVA_HOME=/opt/jdk1.8.0_201

    # Location of Hadoop. By default, Hadoop will attempt to determine
    # this location based upon its execution path.
    export HADOOP_HOME=/opt/hadoop-3.2.0

    hadoop-3.2.0

  2. core-site.xml
    直接复制的话注意看是否需要修改部分目录配置的值。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    <configuration>

    <!-- HDFS的NameService,任意 -->
    <property>
    <name>fs.defaultFS</name>
    <value>hdfs://ha-cluster</value>
    </property>

    <!-- Hadoop存放元数据文件的目录 -->
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/opt/hadoop-3.2.0/tmp</value>
    </property>

    <!-- 流文件的缓冲区大小,单位:KB -->
    <property>
    <name>io.file.buffer.size</name>
    <value>4096</value>
    </property>

    <!-- 指定Zookeeper集群的地址以进行故障自动转移 -->
    <property>
    <name>ha.zookeeper.quorum</name>
    <value>master:2181,master2:2181,slave1:2181</value>
    </property>

    <property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/home/hadoop/HA/data/journalnode</value>
    </property>

    <!-- 将ipc连接重试次数增加到100,sleepTime调到10000,防止因journalnode启动过慢导致namenode启动失败 -->
    <property>
    <name>ipc.client.connect.max.retries</name>
    <value>100</value>
    </property>

    <property>
    <name>ipc.client.connect.retry.interval</name>
    <value>10000</value>
    </property>

    </configuration>
  3. hdfs-site.xml

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    <configuration>

    <!-- 指定HDFS的NameServices,需和core-site.xml中保持一致 -->
    <property>
    <name>dfs.nameservices</name>
    <value>ha-cluster</value>
    </property>

    <!-- 指定ha-cluster下的NameNodes(任取) -->
    <property>
    <name>dfs.ha.namenodes.ha-cluster</name>
    <value>nn1,nn2</value>
    </property>

    <!-- NameNodes的rpc通信地址 -->
    <property>
    <name>dfs.namenode.rpc-address.ha-cluster.nn1</name>
    <value>master:8020</value>
    </property>

    <property>
    <name>dfs.namenode.rpc-address.ha-cluster.nn2</name>
    <value>master2:8020</value>
    </property>

    <!-- NameNodes的http通信地址 -->
    <property>
    <name>dfs.namenode.http-address.ha-cluster.nn1</name>
    <value>master:9870</value>
    </property>

    <property>
    <name>dfs.namenode.http-address.ha-cluster.nn2</name>
    <value>master2:9870</value>
    </property>

    <!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
    <property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://master:8485;master2:8485;slave1:8485/ha-cluster</value>
    </property>

    <!-- HDFS客户端用于联系Active NameNode的Java类,也用于故障转移实现 -->
    <property>
    <name>dfs.client.failover.proxy.provider.ha-cluster</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <!-- 脚本或Java类的列表,用于在故障转移期间屏蔽Active NameNode,多个方法使用换行进行分隔 -->
    <!-- sshfence可用于SSH到Active NameNode并终止进程 -->
    <property>
    <name>dfs.ha.fencing.methods</name>
    <value>sshfence</value>
    </property>

    <!-- 指定SSH密钥文件列表,逗号分隔。sshfence需要免密验证以登录至其他NameNode节点 -->
    <property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/home/hadoop/.ssh/id_rsa</value>
    </property>

    <!-- 可选,配置使用非标准用户或端口来执行SSH -->
    <!--<property>
    <name>dfs.ha.fencing.methods</name>
    <value>sshfence(hadoop:22)</value>
    </property>-->

    <!-- 可选,配置SSH超时时间 单位:毫秒 -->
    <property>
    <name>dfs.ha.fencing.ssh.connect-timeout</name>
    <value>30000</value>
    </property>

    <!-- JournalNode守护程序用于存储其本地状态的路径 -->
    <property>
    <name>/home/hadoop/HA/data/jn_local</name>
    <value></value>
    </property>

    <!-- 开启NameNode故障自动切换 -->
    <property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
    </property>

    <!-- 设置副本数为2 -->
    <property>
    <name>dfs.replication</name>
    <value>2</value>
    </property>

    </configuration>
  4. workers(原 slaves)
    $HADOOP_HOME/sbin 中的脚本以及 hdfs 可通过文件中列出的主机名去启动节点上对应的进程。

    1
    2
    3
    master
    master2
    slave1

MapReduce

  • mapred-site.xml
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    <configuration>

    <!-- 指定mr框架为yarn -->
    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    </property>

    <!-- map任务内存大小,默认1G -->
    <property>
    <name>mapreduce.map.memory.mb</name>
    <value>230</value>
    </property>

    <!-- reduce任务内存大小,默认1G -->
    <property>
    <name>mapreduce.reduce.memory.mb</name>
    <value>460</value>
    </property>

    <!-- map任务运行的JVM进程内存大小,默认-Xmx200M -->
    <property>
    <name>mapreduce.map.java.opts</name>
    <value>-Xmx184m</value>
    </property>

    <!-- reduce任务运行的JVM进程内存大小,默认-Xmx200M -->
    <property>
    <name>mapreduce.reduce.java.opts</name>
    <value>-Xmx368m</value>
    </property>

    <!-- MR AppMaster运行内存,默认1536M -->
    <property>
    <name>yarn.app.mapreduce.am.resource.mb</name>
    <value>460</value>
    </property>

    <!-- MR AppMaster运行的JVM进程内存,默认-Xmx1024m -->
    <property>
    <name>yarn.app.mapreduce.am.command-opts</name>
    <value>-Xmx368m</value>
    </property>

    </configuration>

Yarn

  • yarn-site.xml
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    <configuration>

    <!-- 分别指定RM的地址 -->
    <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>slave1</value>
    </property>

    <!-- 指定ZK集群地址 -->
    <property>
    <name>yarn.resourcemanager.zk-address</name>
    <value>master:2181,master2:2181,slave1:2181</value>
    </property>

    <!-- Shuffle -->
    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>

    <!-- RM中分配容器的内存最小值,默认1G -->
    <property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>230</value>
    </property>

    <!-- RM中分配容器的最大值,默认8G -->
    <property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>700</value>
    </property>

    <!-- 可用物理内存大小,默认8G -->
    <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>700</value>
    </property>

    <!-- 虚拟内存检查是否开启 -->
    <property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
    </property>

    </configuration>

Zookeeper

  1. 拷贝一份配置文件模板,重命名为 zoo.cfg
    cd $ZK_HOME/conf;cp zoo_sample.cfg zoo.cfg
  2. 修改配置文件:vi zoo.cfg
    修改 dataDirdataLogDir,添加 server 信息。

    1
    2
    3
    4
    5
    6
    dataDir=/home/hadoop/HA/data/zookeeper
    dataLogDir=/home/hadoop/HA/logs/zookeeper
    ...
    server.1=master:2888:3888
    server.2=master2:2888:3888
    server.3=slave1:2888:3888

  3. 创建 zoo.cfg 中的 dataDir,在该目录下创建 myid 文件并添加内容,三个节点中的文件内容分别为 123,对应 zoo.cfg 中的 server.X。
    如,master 节点中:
    mkdir -p /home/hadoop/HA/data/zookeeper
    echo 1 > /home/hadoop/HA/data/zookeeper/myid

发送文件

配置文件都已修改完毕,将需要的文件和目录都发送到其他节点上。

  • Zookeeper(/opt 下)
  • Hadoop(/opt 下)
  • /home/hadoop/HA
  • /home/hadoop/.bash_profile(hadoop 用户的配置文件)

如:
scp -r /opt/zookeeper-3.4.13 master2:/opt
scp /etc/profile slave1:/etc

集群启动

需严格按照步骤执行。
在 Hadoop 3 中,可用命令代替直接执行脚本文件:

  • hdfs --workers --daemon => hadoop-daemons.sh
  • hdfs --daemon => hadoop-daemon.sh
  • hdfs --daemon => hadoop-daemon.sh
  1. 在所有节点中启动 JournalNode
    hadoop-daemons.sh start journalnode,注意是 s 脚本:-daemons.sh
    Hadoop 3 可用:hdfs --workers --daemon start journalnode
    * 单节点启动:使用 非 s版本的脚本文件,或者:hdfs --daemon start journalnode
    jps 查看节点的 JVM 进程中是否有 JournalNode(没有的话应该是没启动成功,到 $HADOOP_HOME/logs 目录下看日志报什么错):
  2. 格式化 Active NameNode(master)
    hdfs namenode -format

  3. 启动 NameNode 守护程序(NameNode Daemon)(Active NameNode 节点,master)
    hadoop-daemon.sh start namenode,注意是 非 s 脚本:-daemon.sh
    Hadoop 3 可用:hdfs --daemon start namenode
  4. 在 Standby NameNode(master2)节点复制 Active NameNode(master)的元数据
    hdfs namenode -bootstrapStandby

    成功的话应该能看到如下信息:
  5. 在 Standby NameNode 节点启动 NameNode 守护程序(NameNode Daemon)
    hadoop-daemon.sh start namenode
  6. 在每个节点启动 Zookeeper 服务
    zkServer.sh start(每个节点执行一次)

    正常情况下,Zookeeper 集群状态应该是由 一个 Leader多个 Follower 组成。
    master
    master2
    slave1
    jps 查看进程会有一个 QuorumPeerMain
    jps
  7. 启动 DataNode 守护程序(DataNode Daemon)
    hadoop-daemons.sh start datanode

    Hadoop 3 可用:hdfs --workers --daemon start datanode
  8. 任意一个 NameNode 上格式化 Zookeeper Failover Controller
    hdfs zkfc -formatZK
  9. 在 Active NameNode 上启动 DFS
    start-dfs.sh
  10. 查看 NameNode 节点状态
    hdfs haadmin -getServiceState nn1
  11. 通过浏览器检查每个 NameNode 的状态
    <ip>:<端口>,配置的 http 服务端口为 9870,访问:192.168.222.128:9870

遇到的问题

Zookeeper 启动了,查看状态却提示可能未启动

1
2
3
4
hadoop@master ~> zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper-3.4.13/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.


看日志:zookeeper.out(这个日志文件会生成于当时执行 ZK 脚本所处的目录下)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
hadoop@master ~> tail -20 zookeeper.out
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:610)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:838)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:957)
2019-04-01 19:23:36,364 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer$QuorumServer@184] - Resolved hostname: master2 to address: master2/192.168.222.129
2019-04-01 19:23:36,365 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@584] - Cannot open channel to 3 at election address slave1/192.168.222.130:3888
java.net.ConnectException: 拒绝连接 (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:610)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:838)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:957)
2019-04-01 19:23:36,366 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer$QuorumServer@184] - Resolved hostname: slave1 to address: slave1/192.168.222.130
2019-04-01 19:23:36,366 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@847] - Notification time out: 60000

问题:拒绝连接。
查看防火墙也都已关闭:

百度:Cannot open channel to 3 at election address slave1/192.168.222.130:3888 java.net.ConnectException: 拒绝连接 (Connection refused),无果。
百度:Zookeeper 拒绝连接,看到说是 /etc/hosts 文件的问题,需注释掉 127.0.0.1 行。一开始的 hosts 文件内容:

没按帖子说的直接注释,只删掉了每个 hosts 文件中该行末尾对应的主机名。重启 Zookeeper 后正常:

启动错误:org.apache.hadoop.ipc.Client: Retrying connect to server

问题描述:

  • 配置好 HA,启动后发现 NameNode 无法正常启动,且短时间内 NameNode 进程会消失,应该是崩了。

参考:

三种方法解决:

  1. 修改 core-site.xml 中的 IPC 参数
    调大 namenode 连接 journalnode 的最大时间

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    <!-- 将ipc连接重试次数增加到100,sleepTime调到10000,防止因journalnode启动过慢导致namenode启动失败 -->
    <property>
    <name>ipc.client.connect.max.retries</name>
    <value>100</value>
    </property>

    <property>
    <name>ipc.client.connect.retry.interval</name>
    <value>10000</value>
    </property>
  2. 先启动 JournalNode,再启动 DFS
    hadoop-daemons.sh start journalnode
    start-dfs.sh

  3. 直接启动集群,等 NameNode 崩了以后再手动开起来
    start-dfs.sh
    等 NameNode 崩了以后:
    hadoop-daemon.sh start namenode

start-dfs.sh 后,集群两个 NameNode 都是 Standby 状态

参考:CSDN
解决:需要先启动 Zookeeper 集群,再启动 DFS。
logs:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
2019-04-11 15:09:22,402 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6002 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2019-04-11 15:09:23,408 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 7008 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2019-04-11 15:09:24,415 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 8015 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2019-04-11 15:09:25,416 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 9016 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2019-04-11 15:09:26,420 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 10021 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2019-04-11 15:09:26,806 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.222.128:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=100, sleepTime=10000 MILLISECONDS)
2019-04-11 15:09:26,826 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: slave1/192.168.222.130:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=100, sleepTime=10000 MILLISECONDS)
2019-04-11 15:09:26,841 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master2/192.168.222.129:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=100, sleepTime=10000 MILLISECONDS)
2019-04-11 15:09:27,429 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 11029 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2019-04-11 15:09:28,430 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 12030 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2019-04-11 15:09:29,433 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 13033 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2019-04-11 15:09:30,443 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 14043 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2019-04-11 15:09:31,449 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 15050 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2019-04-11 15:09:32,460 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 16060 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2019-04-11 15:09:33,463 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 17063 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2019-04-11 15:09:34,469 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 18069 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2019-04-11 15:09:35,471 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19071 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2019-04-11 15:09:36,403 WARN org.apache.hadoop.hdfs.server.namenode.FSEditLog: Unable to determine input streams from QJM to [192.168.222.128:8485, 192.168.222.129:8485, 192.168.222.130:8485]. Skipping.
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:473)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:278)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1590)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1614)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:700)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:322)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1052)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:666)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:728)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:953)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:932)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1673)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1741)
2019-04-11 15:09:36,406 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: No edit log streams selected.
2019-04-11 15:09:36,406 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Planning to load image: FSImageFile(file=/opt/hadoop-2.9.2/tmp/dfs/name/current/fsimage_0000000000000000179, cpktTxId=0000000000000000179)
2019-04-11 15:09:36,537 INFO org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode: Loading 2 INodes.
2019-04-11 15:09:36,657 INFO org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: Loaded FSImage in 0 seconds.
2019-04-11 15:09:36,657 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Loaded image for txid 179 from /opt/hadoop-2.9.2/tmp/dfs/name/current/fsimage_0000000000000000179
2019-04-11 15:09:36,678 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Need to save fs image? false (staleImage=true, haEnabled=true, isRollingUpgrade=false)
2019-04-11 15:09:36,678 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock held for 21156 ms via
java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1021)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:261)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1569)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1081)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:666)
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:728)
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:953)
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:932)
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1673)
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1741)
Number of suppressed write-lock reports: 0
Longest write-lock held interval: 21156
2019-04-11 15:09:36,678 INFO org.apache.hadoop.hdfs.server.namenode.NameCache: initialized with 0 entries 0 lookups
2019-04-11 15:09:36,679 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading FSImage in 21157 msecs
2019-04-11 15:09:36,815 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.222.128:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=100, sleepTime=10000 MILLISECONDS)
2019-04-11 15:09:36,861 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: slave1/192.168.222.130:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=100, sleepTime=10000 MILLISECONDS)
2019-04-11 15:09:36,862 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master2/192.168.222.129:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=100, sleepTime=10000 MILLISECONDS)

记录

  • 2019-4-6
    一顿操作:
    开机;
    启动全部 Zookeeper,正常;
    启动 HDFS:start-dfs.sh,看似也正常;
    NameNode 状态:
    nn1 (master):Standby
    nn2 (master2):Active
    nn2 的 namenode 日志中有条警告;
    其他日志均正常。
    hadoop-hadoop-namenode-master2.log
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
      ...
    2019-04-06 17:12:37,856 WARN org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Edit log tailer interrupted
    java.lang.InterruptedException: sleep interrupted
    at java.lang.Thread.sleep(Native Method)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:469)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:399)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:416)
    at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:484)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:412)
    ...

备用

  • nn1
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    2019-04-02 08:12:53,523 WARN org.apache.hadoop.hdfs.server.namenode.FSEditLog: Unable to determine input streams from QJM to [192.168.222.128:8485, 192.168.222.129:8485, 192.168.222.130:8485]. Skipping.
    org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve quorum size 2/3. 3 exceptions thrown:
    192.168.222.129:8485: Call From master/192.168.222.128 to master2:8485 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
    192.168.222.130:8485: Call From master/192.168.222.128 to slave1:8485 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
    192.168.222.128:8485: Call From master/192.168.222.128 to master:8485 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
    at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
    at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:286)
    at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
    at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:485)
    at org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:269)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1673)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1706)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1685)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:703)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:325)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1099)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:716)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:635)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:697)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:940)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:913)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1646)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1713)

nn2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
2019-04-02 03:08:07,600 WARN org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Exception from remote name node RemoteNameNodeInfo [nnId=nn1, ipcAddress=master/192.168.222.128:8020, httpAddress=http://master:9870], try next.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category JOURNAL is not supported in state standby. Visit https://s.apache.org/sbnn-error
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:88)
at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1954)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1442)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4716)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1293)
at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:148)
at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:14726)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1511)
at org.apache.hadoop.ipc.Client.call(Client.java:1457)
at org.apache.hadoop.ipc.Client.call(Client.java:1367)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy16.rollEditLog(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:152)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$2.doWork(EditLogTailer.java:365)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$2.doWork(EditLogTailer.java:362)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$MultipleNameNodeProxy.call(EditLogTailer.java:504)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

格式化 ZK 时

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
===============================================
The configured parent znode /hadoop-ha/ha-cluster already exists.
Are you sure you want to clear all failover information from
ZooKeeper?
WARNING: Before proceeding, ensure that all HDFS services and
failover controllers are stopped!
===============================================
Proceed formatting /hadoop-ha/ha-cluster? (Y or N) 2019-04-11 16:42:31,271 INFO ha.ActiveStandbyElector: Session connected.
y
2019-04-11 16:42:33,827 INFO ha.ActiveStandbyElector: Recursively deleting /hadoop-ha/ha-cluster from ZK...
2019-04-11 16:42:33,917 ERROR ha.ZKFailoverController: Unable to clear zk parent znode
java.io.IOException: Couldn't clear parent znode /hadoop-ha/ha-cluster
at org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:391)
at org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:279)
at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:216)
at org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:60)
at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:175)
at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:171)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:484)
at org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:171)
at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:195)
Caused by: org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = Directory not empty for /hadoop-ha/ha-cluster
at org.apache.zookeeper.KeeperException.create(KeeperException.java:128)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:882)
at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:54)
at org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:386)
at org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:383)
at org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1098)
at org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1090)
at org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:383)
... 8 more
2019-04-11 16:42:33,930 INFO zookeeper.ZooKeeper: Session: 0x3000001d9cc0048 closed
2019-04-11 16:42:33,935 INFO zookeeper.ClientCnxn: EventThread shut down for session: 0x3000001d9cc0048
2019-04-11 16:42:33,942 INFO tools.DFSZKFailoverController: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DFSZKFailoverController at master2/192.168.222.129
************************************************************/
Author: Yout
Link: https://youthug.github.io/blog/2019/03/30/HDFS-High-Availability-With-QJM/
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.