欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

mesos-master 启动时:Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins

程序员文章站 2022-05-10 12:17:44
...

背景

master节点上安装了java8,mesos,zk。我的操作就是想在各个master节点启动mesos master,然后就遇到一下错误,折腾几个小时…记录一下。

mesos-master节点

master-1 192.168.137.20
master-2 192.168.137.21
master-3 192.168.137.22

版本信息

JDK 1.8
zookeeper 3.4.10
mesos 1.9.0

前提是已经启动了ZK

一下命令在zk安装bin目录下执行
启动: ./zkServer.sh start
查看状态: ./zkServer.sh status

master-1为leader

[aaa@qq.com bin]# ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /root/zookeeper-3.4.10/bin/../conf/zoo.cfg
Mode: leader

master-2为follower

[aaa@qq.com bin]# ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /root/zookeeper-3.4.10/bin/../conf/zoo.cfg
Mode: follower

master-3也为follower

[aaa@qq.com bin]# ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /root/zookeeper-3.4.10/bin/../conf/zoo.cfg
Mode: follower

ps:这里也可以通过 echo stat | nc 192.168.137.22 2181 查看zk的状态

messo集群方式启动

mesos的安装,我是安装官网来的;这里就不再细说了,官网有了简单的例子启动一个单节点mesos,启动成功。但是当我以集群方式启动时,启动命令

/root/mesos-1.9.0/build/bin/mesos-master.sh --zk=zk://192.168.137.20:2181,192.168.137.21:2181,192.168.137.22:2181/mesos --ip=0.0.0.0 --log_dir=/var/log/mesos/master  --cluster=test-cluster --quorum=2 --work_dir=/var/lib/mesos/master 

–zk :指定zk集群的地址
–log_dir : 保存log路径
–quorum:使用基于replicated-Log的注册表时复制的个数,改值需要为master总数量的一半以上,我的master为3个节点,所以我设置为2
–work_dir:工作目录

错误日志如下

F1211 21:35:52.226524 10404 master.cpp:1655] Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins
*** Check failure stack trace: ***
    @     0x7fb632ba9290  google::LogMessage::Fail()
    @     0x7fb632ba91ec  google::LogMessage::SendToLog()
    @     0x7fb632ba8ba8  google::LogMessage::Flush()
    @     0x7fb632babe90  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fb6301bd0d5  mesos::internal::master::fail()
    @     0x7fb6302e5bec  _ZNSt5_BindIFPFvRKSsS1_EPKcSt12_PlaceholderILi1EEEE6__callIvIS1_EILm0ELm1EEEET_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
    @     0x7fb6302b8be8  std::_Bind<>::operator()<>()
    @     0x7fb63028139d  _ZZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvRKSsS6_EPKcSt12_PlaceholderILi1EEEEvEERKS2_OT_NS2_6PreferEENKUlOSE_S6_E_clESK_S6_
    @     0x7fb63035514e  _ZN5cpp176invokeIZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvRKSsS8_EPKcSt12_PlaceholderILi1EEEEvEERKS4_OT_NS4_6PreferEEUlOSG_S8_E_ISG_S8_EEEDTclcl7forwardIT_Efp_Espcl7forwardIT0_Efp0_EEEOSO_DpOSP_
    @     0x7fb63034a18f  _ZN6lambda8internal7PartialIZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvRKSsS9_EPKcSt12_PlaceholderILi1EEEEvEERKS5_OT_NS5_6PreferEEUlOSH_S9_E_JSH_SF_EE13invoke_expandISO_St5tupleIJSH_SF_EESR_IJS9_EEJLm0ELm1EEEEDTcl6invokecl7forwardISK_Efp_Espcl6expandcl3getIXT2_EEcl7forwardIT0_Efp0_EEcl7forwardIT1_Efp2_EEEESL_OSU_N5cpp1416integer_sequenceImJXspT2_EEEEOSV_
    @     0x7fb630343710  _ZNO6lambda8internal7PartialIZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvRKSsS9_EPKcSt12_PlaceholderILi1EEEEvEERKS5_OT_NS5_6PreferEEUlOSH_S9_E_JSH_SF_EEclIJS9_EEEDTcl13invoke_expandcl4movedtdefpT1fEcl4movedtdefpT10bound_argsEcvN5cpp1416integer_sequenceImJLm0ELm1EEEE_Ecl16forward_as_tuplespcl7forwardIT_Efp_EEEEDpOSU_
    @     0x7fb63033f18f  _ZN5cpp176invokeIN6lambda8internal7PartialIZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvRKSsSB_EPKcSt12_PlaceholderILi1EEEEvEERKS7_OT_NS7_6PreferEEUlOSJ_SB_E_ISJ_SH_EEEISB_EEEDTclcl7forwardIT_Efp_Espcl7forwardIT0_Efp0_EEEOSS_DpOST_
    @     0x7fb63033c6c1  _ZN6lambda8internal6InvokeIvEclINS0_7PartialIZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvRKSsSC_EPKcSt12_PlaceholderILi1EEEEvEERKS8_OT_NS8_6PreferEEUlOSK_SC_E_ISK_SI_EEEISC_EEEvOT_DpOT0_
    @     0x7fb63033971e  _ZNO6lambda12CallableOnceIFvRKSsEE10CallableFnINS_8internal7PartialIZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvS2_S2_EPKcSt12_PlaceholderILi1EEEEvEERKSB_OT_NSB_6PreferEEUlOSL_S2_E_ISL_SJ_EEEEclES2_
    @     0x55f1e95286e8  _ZNO6lambda12CallableOnceIFvRKSsEEclES2_
    @     0x55f1e9523747  process::internal::run<>()
    @     0x55f1e951cb66  process::Future<>::fail()
    @     0x7fb62fbc87f4  process::Promise<>::fail()
    @     0x7fb6302e0f9d  process::internal::thenf<>()
    @     0x7fb63034e385  _ZN5cpp176invokeIPFvON6lambda12CallableOnceIFN7process6FutureI7NothingEERKN5mesos8internal8RegistryEEEESt10unique_ptrINS3_7PromiseIS5_EESt14default_deleteISH_EERKNS4_IS9_EEEJSD_SK_SN_EEEDTclcl7forwardIT_Efp_Espcl7forwardIT0_Efp0_EEEOSQ_DpOSR_
    @     0x7fb630346c26  _ZN6lambda8internal7PartialIPFvONS_12CallableOnceIFN7process6FutureI7NothingEERKN5mesos8internal8RegistryEEEESt10unique_ptrINS3_7PromiseIS5_EESt14default_deleteISH_EERKNS4_IS9_EEEJSD_SK_St12_PlaceholderILi1EEEE13invoke_expandISP_St5tupleIJSD_SK_SR_EESU_IJSN_EEJLm0ELm1ELm2EEEEDTcl6invokecl7forwardIT_Efp_Espcl6expandcl3getIXT2_EEcl7forwardIT0_Efp0_EEcl7forwardIT1_Efp2_EEEEOSX_OSY_N5cpp1416integer_sequenceImJXspT2_EEEEOSZ_
    @     0x7fb6303412ba  _ZNO6lambda8internal7PartialIPFvONS_12CallableOnceIFN7process6FutureI7NothingEERKN5mesos8internal8RegistryEEEESt10unique_ptrINS3_7PromiseIS5_EESt14default_deleteISH_EERKNS4_IS9_EEEISD_SK_St12_PlaceholderILi1EEEEclIISN_EEEDTcl13invoke_expandcl4movedtdefpT1fEcl4movedtdefpT10bound_argsEcvN5cpp1416integer_sequenceImILm0ELm1ELm2EEEE_Ecl16forward_as_tuplespcl7forwardIT_Efp_EEEEDpOSX_
    @     0x7fb63033e2a9  _ZN5cpp176invokeIN6lambda8internal7PartialIPFvONS1_12CallableOnceIFN7process6FutureI7NothingEERKN5mesos8internal8RegistryEEEESt10unique_ptrINS5_7PromiseIS7_EESt14default_deleteISJ_EERKNS6_ISB_EEEISF_SM_St12_PlaceholderILi1EEEEEISP_EEEDTclcl7forwardIT_Efp_Espcl7forwardIT0_Efp0_EEEOSV_DpOSW_
    @     0x7fb63033b6cf  lambda::internal::Invoke<>::operator()<>()
    @     0x7fb630337a7a  _ZNO6lambda12CallableOnceIFvRKN7process6FutureIN5mesos8internal8RegistryEEEEE10CallableFnINS_8internal7PartialIPFvONS0_IFNS2_I7NothingEERKS5_EEESt10unique_ptrINS1_7PromiseISE_EESt14default_deleteISN_EES8_EISJ_SQ_St12_PlaceholderILi1EEEEEEclES8_
    @     0x7fb630306f00  _ZNO6lambda12CallableOnceIFvRKN7process6FutureIN5mesos8internal8RegistryEEEEEclES8_
    @     0x7fb630402572  process::internal::run<>()
    @     0x7fb6303f5fc1  process::Future<>::fail()
    @     0x7fb63042841f  std::_Mem_fn<>::operator()<>()
    @     0x7fb6304265f0  _ZNSt5_BindIFSt7_Mem_fnIMN7process6FutureIN5mesos8internal8RegistryEEEFbRKSsEES6_St12_PlaceholderILi1EEEE6__callIbIS8_EILm0ELm1EEEET_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
    @     0x7fb630422dc2  std::_Bind<>::operator()<>()
    @     0x7fb63041d411  _ZZNK7process6FutureIN5mesos8internal8RegistryEE8onFailedISt5_BindIFSt7_Mem_fnIMS4_FbRKSsEES4_St12_PlaceholderILi1EEEEbEERKS4_OT_NS4_6PreferEENKUlOSG_S9_E_clESM_S9_
已放弃
[aaa@qq.com master]# 

我尝试着看看命令理配置的log目录 “/var/log/mesos/master”有没有什么错误,ERROR级别信息如下

[aaa@qq.com master]# cat lt-mesos-master.master-3.root.log.ERROR.20191211-213552.10388
Log file created at: 2019/12/11 21:35:52
Running on machine: master-3
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
F1211 21:35:52.226524 10404 master.cpp:1655] Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins

错误原因

看这些log第一感觉就是连接超时,是不是连ZK的时候超时呢?查了网上了资料,提示加上–ip参数

解决

master1执行:

/root/mesos-1.9.0/build/bin/mesos-master.sh  --ip=192.168.137.20 --zk=zk://192.168.137.20:2181,192.168.137.21:2181,192.168.137.22:2181/mesos --log_dir=/var/log/mesos/master  --cluster=test-cluster --quorum=2 --work_dir=/var/lib/mesos/master 

master2执行:

/root/mesos-1.9.0/build/bin/mesos-master.sh  --ip=192.168.137.21 --zk=zk://192.168.137.20:2181,192.168.137.21:2181,192.168.137.22:2181/mesos --log_dir=/var/log/mesos/master  --cluster=test-cluster --quorum=2 --work_dir=/var/lib/mesos/master 

master3执行

/root/mesos-1.9.0/build/bin/mesos-master.sh  --ip=192.168.137.22 --zk=zk://192.168.137.20:2181,192.168.137.21:2181,192.168.137.22:2181/mesos --log_dir=/var/log/mesos/master  --cluster=test-cluster --quorum=2 --work_dir=/var/lib/mesos/master 

其实我就是加了–ip的可以,key为节点的ip地址

启动验证,访问master节点的5050端口
mesos-master 启动时:Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins补充,如果这个页面是不是有个“failed to connect …”的弹窗,通过抓包可以看到,页面上的的定时请求是根据我的虚拟机的主机名来发起请求的。改一下本地的host文件就行了
mesos-master 启动时:Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins
host文件的路径:C:\Windows\System32\drivers\etc ;修改如下

#虚拟机配置start
192.168.137.20 master-1
192.168.137.21 master-2
192.168.137.22 master-3
#虚拟机配置end