How to Install Hadoop?

Hadoop is an open-source Java-based framework. It was built on Java programming language and Linux Operating system. Hadoop is a tool used for big data processing and many companies are using Hadoop to maintain their large set of data.

Hadoop is the project of Apache Software Foundation. Hadoop has undergone a number of changes since the day it was launched. Before getting on how to install Hadoop let us know more about Apache Hadoop.

What is Apache Hadoop?

Apache Hadoop runs on Java and is an open source framework used to process and store a large amount of data on a huge cluster. Hadoop framework is composed of four main components:

  • Hadoop Common
  • Hadoop Distributed File System
  • YARN
  • MapReduce

Want to learn more about Hadoop Framework? Read our previous blog Introduction to Apache Hadoop!

Hadoop Cluster Modes

Installing Hadoop is not an easy task as the Hadoop cluster is very complex to set up. Hadoop Software can be installed and operated in three modes:

  1. Standalone Mode
  2. Pseudo-Distributed Mode
  3. Fully Distributed Mode

Hadoop Cluster Modes

  • Standalone Mode

It is the default mode in which Hadoop is configured on your system after downloading. In the standalone mode, the Hadoop is run on a single node as a single Java process. HDFS is not used in this mode; for input and output, local file system is used. This mode is useful for debugging purpose.

  • Pseudo-Distributed Mode

The Pseudo-distributed mode involves distributed simulation of Hadoop on a single machine. In this mode, a number of multiple DataNodes and TaskTrackers are set up on a single machine. And various daemons run as a separate Java process on the single machine in pseudo-distributed cluster.

  • Fully Distributed Mode

In the Fully Distributed Mode, the code can be run on the actual Hadoop Cluster.  This mode allows you to run the code on thousands of servers, against very large inputs. All the Hadoop Daemons run on the master mode.

Install Hadoop Software

Now, after understanding Hadoop, it is the time to learn how to install Apache Hadoop. Let’s consider the installation of Hadoop on single node i.e. in Standalone mode. This will help you perform simple operations using MapReduce and HDFS. The followings are the steps are to install Hadoop 2.4.1 on Linux.


  • Operating System: Hadoop uses GNU/Linux clusters on 2000 nodes as it is supported by Linux as development and production platform. If you have any other operating system, install Virtualbox software and have Linux inside it.
  • Java: As Hadoop is a Java-based framework, Java is the main prerequisite to install Hadoop. In order to run Hadoop on Linux, Java should be installed. Recommended versions of Java for Hadoop are available at Hadoop Java Versions.
  • SSH: ssh must be installed in order to use the start and stop Hadoop scripts. sshd must be running in order to use them. Pshd must be installed for better sshd resource management.   

Download Hadoop

Download the Hadoop latest stable release from Apache Software Foundation. This is required in order to get the Hadoop Distribution. Use the following command to download and extract Apache Hadoop:

Install Hadoop in Standalone Mode

Once you have downloaded and extracted the Hadoop software, get prepared to set up the Hadoop cluster. You can start using it in any of the three modes. But by default, Hadoop cluster is configured to run on a non-distributed mode i.e. standalone mode. It runs as a single Java process which is useful for debugging.

Setting up Hadoop

    1. Set Hadoop environment variables by running the following command to ~/.bashrc file –


    1. Now, run the following command to ensure that Hadoop is running fine –


    1. You will receive the following output if you have successfully set up Hadoop –


Now, you are ready to work with the Hadoop in standalone mode on a single machine.

Bottom Line

Congratulations! You have successfully installed and setup Hadoop in Standalone mode. It’s the time to start working with Hadoop and use it to store, analyze, and process Big Data. Hadoop is one of the best big data processing tools, has gained popularity due to its significant features like fault–tolerance, scalability, flexibility, and cost-effectiveness.

Waiting for what? Start learning Hadoop now! Enter into the world of Big Data Hadoop and become a Hadoop professional.

Want to know more about Hadoop? Stay tuned and keep learning!

Related Posts

  • 30 May

    What is Yarn in Hadoop?

    The Hadoop ecosystem is going through the continuous evolution. Its processing frameworks are also evolving at full speed with the time. Hadoop 1.0 has passed the limitation of the batch-oriented MapReduce processing framework for the development of specialized and interactive processing model which is Hadoop 2.0. Apache Hadoop was introduced in 2005 and taken over […]

  • 28 May

    What is MapReduce in Hadoop?

    The heart of Apache Hadoop is Hadoop MapReduce. It’s a programming model used for processing large datasets in parallel across hundreds or thousands of Hadoop clusters on commodity hardware. The framework does all the works; you just need to put the business logic into the MapReduce. All the work is divided into the small works […]

  • 15 May

    What is HDFS in Hadoop

    The Hadoop Distributed File System is a java based file, developed by Apache Software Foundation with the purpose of providing versatile, resilient, and clustered approach to manage files in a Big Data environment using commodity servers. HDFS used to store a large amount of data by placing them on multiple machines as there are hundreds […]

  • 11 May

    What is HBase in Hadoop?

    Hadoop HBase is based on the Google Bigtable (a distributed database used for structured data) which is written in Java. Hadoop HBase was developed by the Apache Software Foundation in 2007; it was just a prototype then. Hadoop HBase is an open-source, multi-dimensional, column-oriented distributed database which was built on the top of the HDFS. […]

  • 11 May

    What is Architecture of Hadoop?

    Hadoop is the open-source framework of Apache Software Foundation, which is used to store and process large unstructured datasets in the distributed environment. Data is first distributed among different available clusters then it is processed. Hadoop biggest strength is that it is scalable in nature means it can work on a single node to thousands […]

  • 08 May

    Applications of Big data

    Data is omnipresent. It was there in the past, it is now at present, and it will be there in the future also. But the thing that has changed now is that the industries have realized the importance of data. Industries are now having knowledge of Big Data and are benefited by Big Data applications. […]

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe to our newletter

Get quality tutorials to your inbox. Subscribe now.