Failed to receive heartbeat from agent Cloudera

Heartbeat in hadoop

Heartbeat in hadoop


Failed to receive heartbeat

Adding a new host to the cloudera hadoop cluster fails with no heartbeat from the agent

http://www.cloudera.com/ is a Hadoop distribution and helps you handle the complexity of Hadoop platform with a robust interface, Cloudera manager. It makes it way easier to inspect services and hosts(which can easily grow to thousands on Hadoop clusters).

I have just started my Hadoop journey and have had my share of bumps and breakers time to time. Recently, I was trying to add a new host to my cluster and to my surprise it failed at the last step after having installed all of the packages through cloudera express with an error message:

Failed to receive heartbeat from agent

The complete error message looked like this:

35.162.182.131 Retry | Details Installation failed. Failed to receive heartbeat from agent.Ensure that the host's hostname is configured properly.Ensure that port 7182 is accessible on the Cloudera Manager Server (check firewall rules).Ensure that ports 9000 and 9001 are not in use on the host being added.Check agent logs in /var/log/cloudera-scm-agent/ on the host being added. (Some of the logs can be found in the installation details).If Use TLS Encryption for Agents is enabled in Cloudera Manager (Administration -> Settings -> Security), ensure that /etc/cloudera-scm-agent/config.ini has use_tls=1 on the host being added. Restart the corresponding agent and click the Retry link here.

Although a fairly descriptive error message with hints to troubleshoot the problem, Here’s some more if you continue getting this error message and fail to add the host in cluster.

Here are the steps you should go one by one in order to troubleshoot this error:

1. If you are using a production server, probably there is a firewall blocking your communication with the new IP/host. Add a new rule to firewall to allow/whitelist all of the communication with the new host and white-list the ports.

sudo chkconfig iptables --list

This command should say off for all of the iptables.

Something like this:
iptables 0:off 1:off 2:off 3:off 4:off 5:off 6:off

sudo ufw disable

Also,

RHEL/ CentOS /Oracle Linux 6

chkconfig iptables off/etc/init.d/iptables stop

RHEL/ CentOS /Oracle Linux 7

systemctl disable firewalldservice firewalld stop

Above mentioned commands are coming from hortonworks guide and cloudera guide

2. Turn the firewall/iptables off while trying to add a new host. (You can turn it on later and check if that continues to work for you).

3. Make sure you have SELinux off.

setenforce 0

sudo vi /etc/selinux/config

add below mentioned code to disable SELINUX across reboots as well.

SELINUX=disabled

4. Make sure that DNS Lookup and reverse DNS lookup is working on both of the hosts.

Install bind-utils using the command:

yum install bind-utils

and check that DNS look-up and reverse DNS-lookup is working alright using these commands:

host PublicDNS/FQDN_name

and

host IP

Both of these commands should give you the IP and hostname either ways.

5. Make sure your hostname is set properly.

Check that you’ve your FQDN set using

hostname --fqdn

If not,

hostname <fully.qualified.domain.name>

6. Make sure you can ping both the hosts from either ends.

ping hostname

7. Add the new host to a list of ‘known-hosts’ and allow password-less SSH between two.

Guide to configuring password-less ssh between hosts

It did all of this and yet I was not able to get the two hosts to communicate. If you have not been able to get it to work so far, here’s the magic bullet fix for Failed to receive heartbeat from agent in Cloudera Hadoop .

8. Synchronize the clocks on two systems.

I had one of my system’s time set to UTC and other one to IST.

I switched them both to IST.

  1. sudo cat /usr/share/zoneinfo/Asia/Kolkata > /etc/localtime
  2. date
  3. //should should time in IST

  4. hwclock --systohc --localtime
  5. //should show time in IST

  6. hwclock --show
  7. //should show time in IST

  8. yum install ntp
  9. ntpdate pool.ntp.org
  10. //This updates via NTP

  11. netstats
  12. This part is important and should show the server name your host is synchronized with. If it says unsynchrnized, wait for it to restart and check the status of ntp if it’s running
    If not restart it using,

  13. sudo service ntpd restart
  14. OR

  15. /etc/init.d/ntpd start

Execute this set of commands and make sure that their clocks are synchronized.

I hope this helped you save some time. ๐Ÿ™‚ ๐Ÿ™‚

and Oh, it’s my birthday today ๐Ÿ˜‰ Signing off!

P.S: Internet is a successfully thriving community from generous contributions of people from across the globe. You can help it thrive too. Please contribute. As a reader, you can be contributing with your valuable feedback.
Please drop by a comment or Share to let people know you were here. ๐Ÿ™‚ ๐Ÿ™‚

One thought on “Failed to receive heartbeat from agent Cloudera

  1. Tried all the above on CentOS 7 and fixed all the above but still fails on a single machine test installation of Cloudera Manager at the same place

Leave a Reply

Your email address will not be published. Required fields are marked *