Installing Hadoop onto a personal computer – Step-by-step process
In this article, we are going to learn how to install Hadoop onto one’s personal computer. We understand that Hadoop is Linux based software and it runs on distributed computers. The best way to get good at Hadoop is to be hands-on with it and for that you need access to it. One way to get access is to get in installed on your personal computer. Personal computers have either Window operated systems or OS X operating system; so how do we go installing a Linux-based software on Windows or OS X operating systems. The approach we are going to take will let you install on any operating system including Windows or OS X operating system.
First we will install the virtual machine software on our personal computer which can either be a laptop or a desktop. Once we have the software installed we will then import a virtual machine in which Linux OS will be running. Now we have Linux, the Linux is running on a virtual machine and the virtual machine is on your personal computer. So this way we have taken the OS of the personal computer out of the equation since this virtual machine is available for most operating systems and within Linux we will have the Hadoop tools running. This way we will achieve what we want and additionally using a virtual machine, it will be much each running the software present therein; you can take snapshots, you can copy or clone machine versions easily and it will make management and maintenance very easy.
There are a number of virtual machines that can be used, for example, the Oracle Virtual Box, BMWare server, etc. Comparing the above tow Oracle Virtual Box is easier to manage as compared to the latter yet both are quite good options. And now for installing Linux and Hadoop we will take the easy way; rather than installing Hadoop on Linux and installing Linux on the virtual machine, we simply import an appliance and we simply download the appliance on the virtual machine.
Now that we are in the virtual machine area what we can do is simply download the whole virtual machine on which somebody has already downloaded the whole of the Linux and Hadoop setup and it could be downloaded as a file. For your information, the term appliance is used to refer to hardware and software together and it is also used to refer to the soft copies of the virtual machines that are downloaded as files because when you install this file to virtual machine software you end up having a computer with software already stalled on it and this is a virtual computer of course. So, now the Hadoop services would be running Linux and the Linux would be running on the virtual machine. On the personal computer we will have a browser and we will open up the browser and we will connect to the virtual machine and that is how we will be accessing Hadoop.
Now, in terms of the hardware requirement you will need 3-4 GB RAM on your host computer; personally, we would recommend 4 GB because 2 GB will be occupied by the virtual machine and so only 1-2 GB will be left to the host computer. If you have less memory you will have to downsize the virtual machine, which means you will have to tell it only to use 1 GB since by default it is designed to use 2 GB. There won’t be much hard disk space required; generally, we would just require 10 GB or so where 5 GB will be used by the virtual machine itself and 2 GB is occupied when we download the Hadoop file and some extra space is required for other functions.
In short, here are the steps to take to install Hadoop onto one’s personal computer:
- Download VirtualBox
- Install VirtualBox
- Download Appliance
- Import Appliance
- Configure Virtual Machine
- Start Virtual Machine
- Test connection from host
Once all these steps are done we are good to go with working with Hadoop hands-on.