How to completely remove and uninstall HDP components (Hadoop uninstall) on Linux system

Horton-works data platform abbreviated as HDP is a completely open source distribution of Hadoop. As much as a breeze it is to provision, manage and monitor clusters each with multiple nodes through Ambari, It’s a pain in a*s as soon as any of the component installations through ambari go awry (which isn’t so uncommon actually) . You are plain lucky if you could get all your components up and running in first go. I for sure could not and that’s how I learnt about most of the directory structures and hierarchies of file system on Linux (especially since I was new to Linux). Ambari is again a management tool with a web based user-friendly interface for provisioning of your clusters in hadoop.

Red Alert: There is no ‘easy way’ to uninstall and completely remove the HDP components installed through ambari.

I have seen,met and heard of a lot of people who just end up doing a fresh OS install, just because Ambari adds insane number of files to your system, in a variety of places with a lot of ‘symlinks’ that it just turns out to be ridiculously difficult to remove and uninstall HDP components completely.

I believe I was a peculiar victim that got stuck in the vicious cycle of installing and uninstalling HDP components through ambari that I was spending more time in uninstalling these components and installing them again than I was in learning about this amazing open source platform, Hadoop. So, now that I have mastered the art of removing and uninstalling HDP components completely(if not anything else) including all hosts,services and its components, I record it for anyone’s and everyone’s reference.

How to remove and uninstall HDP components completely?

Here is a step by step tutorial to do just that: The HDP Uninstall!

    1. 1. Stop all services in ambari

    You can stop all your services in ambari through Web-UI as well as command line:

    1. To stop your services from Ambari web based user interface:

    Go to Ambari Web UI that runs on port 8080 by default. The URL to access Ambari UI would be structured like this:

    HOST_FQDN:8080/

    Stop all services in ambari through Web UI

    Stop all services in ambari through Web UI

    Click on button Actions at the bottom right end and press stop all services.

    2. To stop ambari services from command line interface using Ambari REST API
    You need to stop each of the services one after the other :

    curl -u ambari_USERNAME:ambari_PASSWORD -H X-Requested-By: ambari -X PUT -d {RequestInfo:{context:Stop Service},Body:{ServiceInfo:{state:INSTALLED}}} http://AMBARI_SERVER_HOST:8080/api/v1/clusters/CLUSTER_NAME/services/SERVICE_NAME

    If you would like to know names of each of those services, This guy here has done pretty amazing work in the post and shared names of each of the services in ambari and explains the deletion of ambari services as well as its host components.

      2. Run this python script to clean up and remove a couple of folders created on your system that comprise the basic part of distribution:

    python /usr/lib/python2.6/site-packages/ambari_agent/HostCleanup.py --silent --skip=users

    Note: Use –skip=users argument if you would like the users created for each of the servers/services not to be deleted. From what I have seen, it usually makes no difference and the users really need not be deleted.

      3. Remove and clean hadoop packages completely

    Remove all of the Hortonworks HDP components and package by using yum remove command:

    For example:

    1. Remove and uninstall hive completely,
    yum remove hive\*

    2. Remove and uninstall oozie completely,
    yum remove oozie\*

    3. Remove and uninstall pig completely,
    yum remove pig\*

    4. Remove and uninstall zookeeper completely,
    yum remove zookeeper\*

    5. Remove and uninstall tez completely,
    yum remove tez\*

    6. Remove and uninstall hbase completely,
    yum remove hbase\*

    7. Remove and uninstall ranger completely,
    yum remove ranger\*

    8. Remove and uninstall knox completely,
    yum remove knox\*

    9. Remove and uninstall ranger completely,
    yum remove ranger\*

    10. Remove and uninstall storm completely,
    yum remove storm\*

    11. Remove and uninstall accumulo completely,
    yum remove hadoop\*

    12. Remove and uninstall zookeeper completely,
    yum remove hadoop\*

    13. Remove and uninstall ambari-server

    ambari-server stop<code>yum erase ambari-server

    14. Remove and uninstall ambari-agent

    ambari-agent stop<code>yum erase ambari-agent

    In case you are using some other distribution of linux, follow this link to find commands to remove these packages on your system.

      4. Remove yum repositories meant for ambari and HDP components

    rm -rf /etc/yum.repos.d/ambari.repo /etc/yum.repos.d/HDP*
    yum clean all

      5. Clean all folders

    1. Remove all log folders: (You could use rm -rf dir_name command or delete the folders manually)

    /var/log/ambari-metrics-monitor
    /var/log/hadoop
    /var/log/hbase
    /var/log/hadoop-yarn
    /var/log/hadoop-mapreduce
    /var/log/hive
    /var/log/oozie
    /var/log/zookeeper
    /var/log/flume
    /var/log/hive-hcatalog
    /var/log/falcon
    /var/log/knox
    /var/lib/hive
    /var/lib/oozie

    2. Remove all hadoop directories:
    /usr/hdp
    /usr/bin/hadoop
    /tmp/hadoop
    /var/hadoop
    /hadoop/*
    /local/opt/hadoop

    3. Remove all config directories:
    /etc/hadoop
    /etc/hbase
    /etc/oozie
    /etc/phoenix
    /etc/hive
    /etc/zookeeper
    /etc/flume
    /etc/hive-hcatalog
    /etc/tez
    /etc/falcon
    /etc/knox
    /etc/hive-webhcat
    /etc/mahout
    /etc/pig
    /etc/hadoop-httpfs

    4. Remove all process Id’s
    /var/run/hadoop
    /var/run/hbase
    /var/run/hadoop-yarn
    /var/run/hadoop-mapreduce
    /var/run/hive
    /var/run/oozie
    /var/run/zookeeper
    /var/run/flume
    /var/run/hive-hcatalog
    /var/run/falcon
    /var/run/webhcat
    /var/run/knox

    5. Remove all zookeeper db files
    /local/home/zookeeper/*

    6. Remove all library folders
    /usr/lib/flume
    /usr/lib/storm
    /var/lib/hadoop-hdfs
    /var/lib/hadoop-yarn
    /var/lib/hadoop-mapreduce
    /var/lib/flume
    /var/lib/knox

    7. Remove oozie-temp folder
    /var/tmp/oozie

      6. Remove database

        Depending on what database you used for ambari, hive and oozie, just go and drop the databases created for each of these.

        That;s a good deal of data clean up, isn’t it?

        But hold on, when you will start to reinstall all these components from ambari, the components are still bound to fail and it’s pretty much guaranteed.

        Ask me, why?

        I mentioned a little something about symlinks in the very beginning of this post. That’s where all of the problem lies. Symlinks on linux systems are links to other files and directories.

        Now, what really happens is that we essentially deleted all the files and folders but never removed these symlinks. Now when ambari installs these components, it checks originally for the source of these symlinks and since it resides, it skips the installations of some of the required components and thus starts the whole game of bizarre problems when you thought the installations were complete and actually try to start the services in Ambari. That’s pretty much the sad story with services and components installed through ambari.

        Now, how to fix this part ?

        Solution: Ambari services and components are never completely uninstalled until we remove these symlinks originally.

        1. Go to directory /etc/conf
        2. Delete each of the HDP directories and components.

        and you are done! You can now reinstall the ambari and all its components without having to do a fresh OS install ;) . I hope that helps.

        P.S: Internet is a successfully thriving community from generous contributions of people from across th globe. You can help it thrive too. Please contribute. As a reader, you can be contributing with your valuable feedbacks.
        Please drop by a comment or Share to let people know you were here. :) :)

  • 10 thoughts on “How to completely remove and uninstall HDP components (Hadoop uninstall) on Linux system

    1. Wow! This was the most useful set of information on how to get rid of all the bits. I installed Ambari 2.1 and then realized it wasn’t 2.2. Disaster. Re-installing over it does not work. I didn’t get to the other components so don’t know how useful the description of the steps are. Very helpful information for removing all the Ambari bits. I haven’t tried re-installing Ambari 2.2 again.

      I did think about provisioning a new VM. I will probably provision a new VM next time.

      Oh, and SEO brought your post up in a pretty high ranking.

      Thank you!

    2. Hi, this is the best script I’ve found so far.

      I’m working on a mix of some procedures I found over the internet, like:

      https://gist.github.com/nsabharwal/f57bb9e607114833df9b
      https://pulsatingtaurus.wordpress.com/2015/02/15/clean-uninstall-hortonworks-hdp-2-2/
      https://mapredit.blogspot.com.br/2014/06/remove-hdp-and-ambari-completely.html (a good one)
      https://community.hortonworks.com/questions/1110/how-to-completely-remove-uninstall-ambari-and-hdp.html (only ambari stuff)
      https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_installing_manually_book/content/ch_uninstalling_hdp_chapter.html
      https://issues.apache.org/jira/browse/AMBARI-12581
      http://www.yourtechchick.com/hadoop/how-to-completely-remove-and-uninstall-hdp-components-hadoop-uninstall-on-linux-system/

      Based on that, I would like to suggest some changes in your document:

      a) Please check the “remove” steps 7, 9, 11 and 12, they’re duplicated or referencing different modules.

      b) About database cleanup, I would suggest to also remove PostgreSQL and mysql, specially PostgreSQL because when reinstalling you should get some issues when Ambari tries to connect to it.

      c) About symlinks, I used to see them in /usr/bin, and not on /etc/conf.

      d) At the end of the script, it’s good to doublecheck files and folders, by running “find / -name **” and removing any remaining files/folders.

      find / -name *hpd*
      find / -name *hadoop*
      find / -name *hdfs*
      find / -name *yarn*
      find / -name *spark*
      etc….

      e) A reboot at the end would be also useful.

      Best Regards,

      Marcos.

    3. Hi Marcos,
      Thank you so much for your valuable input. Sure I would add it to the post 🙂

      Regards,
      Simran Kaur

    4. Simran very helpful post. I will add to the idea by also looking for newly created files across system or individual folders as follows. It may help in final check to see if any remnants remain.
      find / -type f -newermt yyyy-dd-mm -ls
      for eg following would list all files created on Nov 3rd 2016
      find / -type f -newermt 2016-11-03 -ls
      For new installs creating a snapshot of the virtual machine before starting hdp install saves time by reverting back to original pre install state. Production nodes is a different story altogether, i would also recommend to take a look at Chef/Puppet/Vagrant all of those provide automation around configuration and available for free upto a 25 node setup.

    5. Great article Simran.

      I did the full set of steps you’ve mentioned. There was still a few more files around that I found using

      find / -group hadoop

      You can probably add it at the end.

    6. @Nabeel: I am glad it helped. 🙂 Sure! Thank you for your contributions.
      Please share the article to support and let people know you were here 🙂

    7. Hi,

      As per the steps i completed removed HDP and Ambari.

      I re-installed Ambari, but when i logged in ambari, i am able to all components again and no option to install HDP again.

      Please let me know where i did mistake.

      Thanks
      Venkat

    Leave a Reply

    Your email address will not be published. Required fields are marked *