Hadoop Developer Course Content


Course Content for Hadoop Developer

This Course Covers 100% Developer and 40% Administration Syllabus.

Introduction to BigData, Hadoop:-

 Big Data Introduction
 Hadoop Introduction
 What is Hadoop? Why Hadoop?
 Hadoop History?
 Different types of Components in Hadoop?
 HDFS, MapReduce, PIG, Hive, SQOOP, HBASE, OOZIE, Flume, Zookeeper and so on…
 What is the scope of Hadoop?

Deep Drive in HDFS (for Storing the Data):-
 Introduction of HDFS
 HDFS Design
 HDFS role in Hadoop
 Features of HDFS
 Daemons of Hadoop and its functionality
o Name Node
o Secondary Name Node
o Job Tracker
o Data Node
o Task Tracker
 Anatomy of File Wright
 Anatomy of File Read
 Network Topology
o Nodes
o Racks
o Data Center
 Parallel Copying using DistCp
 Basic Configuration for HDFS
 Data Organization
o Blocks and
o Replication
 Rack Awareness
 Heartbeat Signal
 How to Store the Data into HDFS
 How to Read the Data from HDFS
 Accessing HDFS (Introduction of Basic UNIX commands)
 CLI commands

MapReduce using Java (Processing the Data):-

 The introduction of MapReduce.
 MapReduce Architecture
 Data flow in MapReduce
o Splits
o Mapper
o Portioning
o Sort and shuffle
o Combiner
o Reducer
 Understand Difference Between Block and InputSplit
 Role of RecordReader
 Basic Configuration of MapReduce
 MapReduce life cycle
o Driver Code
o Mapper
o and Reducer
 How MapReduce Works
 Writing and Executing the Basic MapReduce Program using Java
 Submission & Initialization of MapReduce Job.
 File Input/Output Formats in MapReduce Jobs
o Text Input Format
o Key Value Input Format
o Sequence File Input Format
o NLine Input Format
 Joins
o Map-side Joins
o Reducer-side Joins
 Word Count Example
 Partition MapReduce Program
 Side Data Distribution
o Distributed Cache (with Program)
 Counters (with Program)
o Types of Counters
o Task Counters
o Job Counters
o User Defined Counters
o Propagation of Counters
 Job Scheduling


 Introduction to Apache PIG
 Introduction to PIG Data Flow Engine
 MapReduce vs. PIG in detail
 When should PIG use?
 Data Types in PIG
 Basic PIG programming
 Modes of Execution in PIG
o Local Mode and
o MapReduce Mode
 Execution Mechanisms
o Grunt Shell
o Script
o Embedded
 Operators/Transformations in PIG
 PIG UDF’s with Program
 Word Count Example in PIG
 The difference between the MapReduce and PIG


 Introduction to SQOOP
 Use of SQOOP
 Connect to mySql database
 SQOOP commands
o Import
o Export
o Eval
o Codegen etc…
 Joins in SQOOP
 Export to MySQL
 Export to HBase


 Introduction to HIVE
 HIVE Meta Store
 HIVE Architecture
 Tables in HIVE
o Managed Tables
o External Tables
 Hive Data Types
o Primitive Types
o Complex Types
 Partition
 Joins in HIVE
 HIVE UDF’s and UADF’s with Programs
 Word Count Example


 Introduction to HBASE
 Basic Configurations of HBASE
 Fundamentals of HBase
 What is NoSQL?
 HBase Data Model
o Table and Row
o Column Family and Column Qualifier
o Cell and its Versioning
 Categories of NoSQL Data Bases
o Key-Value Database
o Document Database
o Column Family Database
 HBASE Architecture
o HMaster
o Region Servers
o Regions
o MemStore
o Store
 How HBASE is differed from RDBMS
 HDFS vs. HBase
 Client-side buffering or bulk uploads
 HBase Designing Tables
 HBase Operations
o Get
o Scan
o Put
o Delete


 What is MongoDB?
 Where to Use?
 Configuration On Windows
 Inserting the data into MongoDB?
 Reading the MongoDB data.

Cluster Setup:--

 Downloading and installing the Ubuntu12.x
 Installing Java
 Installing Hadoop
 Creating Cluster
 Increasing Decreasing the Cluster size
 Monitoring the Cluster Health
 Starting and Stopping the Nodes


 Introduction Zookeeper
 Data Modal
 Operations


 Introduction to OOZIE
 Use of OOZIE
 Where to use?


 Introduction to Flume
 Uses of Flume
 Flume Architecture
o Flume Master
o Flume Collectors
o Flume Agents

Project Explanation with Architecture


Various modern tools that come under Hadoop ecosystem

Here we shall discuss a little about the different components of the Hadoop ecosystem. First let us start with what is meant by Hadoop ecosystem. HDFS and MapReduce are the core components of Hadoop framework on which Big Data is stored and processed in a distributed manner. Hadoop ecosystem refers to a set of tools that help in storage and processing of Big Data. Another important fact is that the members of the Hadoop ecosystem are always increasing. What is happening is that all the tools that are based on distributed technology are getting integrated with Hadoop framework with time; this way the possible use cases increase substantially.


The first one that needs to be discussed here is Pig: it is a tool that used cryptic statements to process the data. It would take an enormous amount of time and effort to write multiple jobs in languages like Java or Python. The pig is a relatively simple data flow language that cuts down on development time and efforts. Typically it was designed for data scientists who have less time and programming skills.


Hive provides SQL-like language tool that runs on top of MapReduce. Pig and Hive were developed at different places; Pig was developed by Yahoo and Hive by FaceBook but with the same idea in mind. Both of these tools were designed to aid data scientists with poor programming skills to process the data. So, observe that both Pig and Hive are above MapReduce layer; the code written in Pig or Hive gets converted to MapReduce jobs that are then run on HDFS.


To facilitate the movement of data into or out of Hadoop, the tools Flume and Scoop were created. Scoop helps in moving the data from a relational database and Flume is used to ingest the data as it is generated by an external source. Then there are tools like Impala, which is used for low-latency queries. HBase is another tool that provides features like a real-time database for retrieving the data from HDFS, and there are many other similar tools that provide different functionalities as well.


The biggest problem with all these tools is that they have been developed independently and parallel by various organizations. For example, when Yahoo came up with Pig, FaceBook came up with Hive, and they made the tools open source to be used by everybody. So, what has happened is there are a lot of compatibility issues between the two tools. This is where Cloudera and Hortonworks come into the picture, and they score points. They also package all the open source components and add their flavor and release their versions of packages. Hence, these packages released by them have all the ecosystem components, and they are compatible as well. Hence, the business model is to keep the products open source and free of charge and charge for the services.

Comments (29)

  • Chandra Sekhar Reddy

    I am looking for Hadoop training institutes.
    I want to learn the hadoop course in evening batches. Did any Evening batch is running right now?Call me over the phone with all details.

    December 26, 2014 at 3:55 pm
  • Raju

    I am looking for Hadoop Course Training Institute in Ameerpet.
    Please let me know fee details and timings, I want to learn Hadoop.


    December 28, 2014 at 3:04 pm
  • Hadoop Experts

    Most of the faculties that are working did not take evening batches because of most of them working for USA or Europe clients.
    So, if you want real time working people, still not possible due to time constraint better to take online training without compromising the quality of teaching.Can try once online Free demo.Or talk with faculty to look other options.
    Our faculty did not take evening batches.Only morning batches.

    December 28, 2014 at 3:26 pm
  • Hadoop Experts

    Next batch for classroom training was going to start on 5th Jan’2015 at 10 am. Attend same demo timings.
    Course fee: 10,000/-.Until Jan’ 5th.

    December 28, 2014 at 3:30 pm
  • Teja

    Hi, I am looking for best training institute for Hadoop in Hyderabad. I don’t know JAVA. Is it still possible to take classroom training?Let me know this, so I plan to attend the demo.

    December 28, 2014 at 6:28 pm
  • Rohith

    I would like to know the following :
    Hi, can you tell me how Hadoop is useful for fraud detection ,futer business analysis

    January 22, 2015 at 2:28 pm
  • Hadoop Experts

    Hadoop is not for a specific use case say fraud detection. You will have to build algo by yourself. However, if data size is of Big data scale then you can use MapReduce to execute the algo. If the data size is small you can execute it conventionally. No need of Hadoop.

    January 23, 2015 at 7:33 pm
  • ali mohammad

    Hi, I am a B.Com (computer applications) graduate with no work experience. I am willing to start my career now. N recently I came across with this HADOOP course. So I want to know that can get good opportunities in Indian companies if I complete this course and get good pay without any prior software experience? I hear your Institute is a good one in Hyderabad.

    February 9, 2015 at 5:46 pm
  • parul

    I have done MTech in computer science in 2012. before this I was in teaching fields in my native place where I wasn’t having an option to work in software firm. Now I shifted to Hyderabad and want to join software firm so is it a good idea to join Hadoop training? I am the bit confused about this please help me out as java core is good. And can u tell me you provide job placement or not.
    A lot of reviews suggest that your Institute is Best one in Hyderabad. So I want join.

    February 10, 2015 at 8:14 am
  • siva

    Hi I am Siva, a finance person working in MNC from last 4 years, I want to settle in IT, is it fine for us to understand Hadoop as we came from the finance background,

    February 16, 2015 at 6:33 pm
  • Sandeep Reddy

    I am Selenium developer having around 5yrs of experience now I would like to learn Hadoop.

    1) what is Hadoop?
    2) how is the job market for experienced members

    3)duration and fee for classroom and online trading
    4)do u provide any placements?

    Please drop me a mail with full details

    February 18, 2015 at 8:36 pm
  • Venkat


    I have been working in selenium since 5 Years.
    I Want to learn Hadoop. Could you please provide details for the following queries

    1. If selenium tester learns Hadoop, How it can be useful in future and any impediments will come.
    2. Training details.
    3. Course details.

    March 2, 2015 at 12:30 pm
  • Sivakumar

    I would like to attend the HADOOP big data classroom training from April 2nd week onwards. Request you to provide the timing and fee details.
    My details:
    6 years of IT experience
    experience in SQL and R language

    March 27, 2015 at 10:35 am
  • shiva alladi

    Can I know MapReduce Simple Interview Questions?

    March 27, 2015 at 7:04 pm
  • Chandrasekhar

    I am a fresher. I want to learn Hadoop. Please let me know what re all prr-requesites tech. need to learn before going for Hadoop, Or Can i start Hadoop without prior knowledge of any technologies. I only know C++.


    March 28, 2015 at 1:54 pm
  • Hadoop Experts

    Main features of MapReduce?
    – Parallel Processing, Fault Tolerance

    Can we run MapReduce job without reducer?
    – yes

    How to set the reducers?
    – D mapred.reduce.tasks=2

    While processing data, If task tracker fails what will happen?
    – Job tracker will assign the task to other task trackers

    While processing data, If job tracker fails what will happen?
    – We are not able to run any jobs

    What is combiner?
    – The Combiner is a ‘mini-reduce’ process which runs on the local node

    How many mappers for 1 GB file with Input split size 64 MB?
    – 16

    what is partitioner?
    – It distributes the map output over the reducers

    What is distributed cache?
    – It is a facility provided by the Map-Reduce framework to cache files and distribute to all nodes in Hadoop Cluster

    What are the basic parameters of a Mapper?
    – LongWritable and Text

    What are the phases b/w mapper and reducer?
    – Partition, Sorting, Shuffling

    What is shuffling in MapReduce?
    – The process by which the system performs the sort and transfers the map outputs to the reducer as inputs are known as the shuffle

    How to kill the job?
    – Hadoop job –kill jobId

    Difference between HDF Sock and input split?
    – Logical division of data is known as Split while physical division of data is known as HDFS Block

    What are the methods in mapper and reducer?
    – Setup, Map/Reduce, CleanUp

    What is the purpose of RecordReader in Hadoop?
    – It actually loads the data from its source and converts it into (key, value) pairs suitable for reading by the Mapper

    What is JobTracker?
    – JobTracker is the service within Hadoop that runs MapReduce jobs on the cluster

    What is TaskTracker?
    – Task tracker actually execute tasks(map/reduce tasks)

    How did you debug your MapReduce code?
    – By using counters and web interface provided by Hadoop framework

    March 30, 2015 at 5:12 pm
  • Agniva Chatterjee


    I have overall 4.5 years of experience in Oracle Business Intelligence and I want to learn Hadoop. I don’t have any prior experience of Java. Can I still learn Hadoop and if so, will it be challenging for me to grab the Hadoop concepts?
    Let me know the schedule for your free demo session and contact of Mr. Praveen so that I can have a chat with him also.


    March 31, 2015 at 9:29 am
  • Roshan Singh

    Did you provide Big Data Hadoop training online? I am an MS fresher from UK and willing to learn Hadoop Administration. Are there any upcoming batches for online training in May Month.My friend also looking Hadoop training in Hyderabad Kukatpally. Did had any Branch there. Let me know with all details.

    April 13, 2015 at 5:31 am
  • moidul

    If one is running 100’s of the job per day and keep the output of each job in hdfs, isn’t that too much? Is there an inbuilt mechanism to keep the history intact but not spoil the hdfs file system directory structure?

    What do you do when is your local file System is full?

    May 5, 2015 at 1:09 pm
  • Hadoop Experts

    You will delete the files which you are not using and so is the case with HDFS too. You need to manage the HDFS.

    May 12, 2015 at 2:49 am
  • renu

    Hi, I want to learn Hadoop through online training but which faculty will teach online? Is that praveen sir or else what is the profile of faculty?
    Also will you be teaching practicals online?

    May 14, 2015 at 12:26 pm
  • G. Naresh Kumar

    I want to learn the Hadoop tool. But, I am working as a faculty in one reputed organization. I would like to know is there any possibility for evening batches. please make a text. what is the fee structure and what you are going to taught.

    May 16, 2015 at 7:59 am
  • Anand

    I am looking for Hadoop Course training Institute in Ameerpet, Hyderabad
    please let me know fee details and timings,

    May 20, 2015 at 12:12 pm
  • Ajaz Ahmad

    Hi, if we attend training on HADOOP Developer, is we are able to do MCA project in Hadoop. can you specify the applications where we will use Hadoop? Can you provide us support for developing a Project, we are three students in a Batch. please let us know.

    July 9, 2015 at 9:26 am
  • Vivek

    I’m Vivek from Chennai, I ‘m looking to study Hadoop Classroom training course in Hyderabad. But, I am not having much knowledge in Java. Is is possible, Can I proceed to join Hadoop Training courses?

    Kindly please reply this, as soon as possible.

    July 12, 2015 at 6:11 pm
  • Kishore

    I am a fresher. I want to learn Hadoop. Please let me know what are all pre-requisites need to learn before going for Hadoop, Or Can I start Hadoop without prior knowledge of any technologies. I’m studying MBA but, Am I suitable for Hadoop course.Which one is good for me either Data Scientist Course or Bigdata Hadoop Training? Please respond this mail.

    October 16, 2015 at 3:19 pm
  • Phani

    Is there any opportunities for fresher on Hadoop? I am looking for Computer Training Institutes For Hadoop Near Ameerpet.Which one is best institute for Hadoop in Ameerpet Hyderabad?

    October 20, 2015 at 4:57 pm
  • Rajapp.Y

    I want to do HADOOP, but I am fresher. Is there any calls for fresher. I am interested in the Hadoop Developer training. Is knowing Java is a prerequisite for this course? When is the next online course and what is the fee?

    January 2, 2016 at 11:54 am
  • Gaurav

    I want to do Hadoop online training. Please let me know when the next batch will start.
    Also I want full detail of Hadoop Developer course.

    August 5, 2016 at 1:37 pm

Leave a Reply

Your email address will not be published. Required fields are marked *

Protected by WP Anti Spam