Apache Spark and Scala Training

Spark and Scala Training Course Content

Module 1

Introduction to Scala

Learning Objectives – In this module, you will understand basic concepts of Scala, motives towards learning a new language and get your set-up ready.


1)      Why Scala?

2)       What is Scala?

3)       Introducing Scala

4)       Installing Scala

5)       Journey – Java to Scala

6)       First Dive – Interactive Scala

7)       Writing Scala Scripts – Compiling Scala Programs

8)       Scala Basics

9)       Scala Basic Types

10)    Defining Functions

11)    IDE for Scala, Scala Community

Module 2

Scala Essentials

Learning Objectives – In this module, you will learn essentials of Scala that are needed to work on it.


1)       Immutability in Scala – Semicolons

2)       Method Declaration, Literals

3)       Lists

4)       Tuples

5)       Options

6)       Maps

7)      Reserved Words

8)       Operators

9)      Precedence Rules

10)    If statements

11)    Scala For Comprehensions

12)    While Loops

13)    Do-While Loops

14)    Conditional Operators

15)    Pattern Matching

16)    Enumerations

Module 3

Traits and OOPs in Scala

Learning Objectives – In this module, you will understand implementation of OOPs concepts in Scala and use Traits as Mixins


1)       Traits Intro – Traits as Mixins

2)       Stackable Traits

3)       Creating Traits Basic OOPS – Class and Object Basics

4)       Scala Constructors

5)       Nested Classes

6)       Visibility Rules

Module 4

Functional Programming in Scala

Learning Objectives – In this module, you will understand functional programming know-how for Scala.


1)       What is Functional Programming?

2)       Functional Literals and Closures

3)       Recursion

4)       Tail Calls

5)       Functional Data Structures

6)       Implicit Function Parameters

7)       Call by Name

8)       Call by Value

Module 5

Introduction to Big Data and Spark

Learning Objectives – In this module, you will understand what Big Data is, it’s associated challenges, various frameworks available and will get the first-hand introduction to Spark


1)       Introduction to Big Data

2)       Challenges with Big Data

3)       Batch Vs. Real-Time Big Data Analytics

4)       Batch Analytics – Hadoop Ecosystem Overview

5)       Real-Time Analytics Options, Streaming Data – Storm

6)       In Memory Data – Spark

7)       What is Spark?

8)       Modes of Spark

9)       Spark Installation Demo

10)    Overview of Spark on a cluster

11)    Spark Standalone Cluster

Module 6

Spark Baby Steps

Learning Objectives – In this module, you will learn how to invoke Spark shell and use it for various standard operations.


1)       Invoking Spark Shell

2)       Loading a File in Shell

3)       Performing Some Basic Operations on Files in Spark Shell

4)       Building a Spark Project with sbt, Building and Running Spark Project with sbt

5)       Caching Overview, Distributed Persistence

6)       Spark Streaming Overview

7)       Example: Streaming Word Count

Module 7

Playing with RDDs

Learning Objectives – In this module, you will learn one of the building blocks of Spark – RDDs and related manipulations for implementing business logics.


1)       RDDs

2)       Transformations in RDD

3)       Actions in RDD

4)       Loading Data in RDD

5)       Saving Data through RDD

6)       Scala and Hadoop Integration Hands-on

Module 8

Shark – When Spark meets Hive

Learning Objectives – In this module, you will see different offspring of Spark like Shark, SparkSQL, and Mila. This session is primarily interactive for discussing industrial use cases of Spark and latest developments happening in this area.


1)       Why Shark?

2)       Installing Shark

3)       Running Shark

4)       Loading of Data

5)       Hive Queries through Spark

6)       Testing Tips in Scala

7)       Performance Tuning Tips in Spark

8)       Shared Variables: Broadcast Variables

9)       Shared Variables: Accumulators

What is a Spark?

Apache Spark is a day to an analytics cluster computing framework. It is an open source software.It was a fully developed in the A&P lab that you see Barkley spa fits into the Hadoop open source community.It builds on top of the Hadoop distributed file system called a DFS, however, Spark is not tied to the two-stage MapReduce paradigm.It promises performance up to times faster than how do MapReduce for certain applications. Spark provides primitives foreign memory cluster computing; the in-memory cluster computing allows use programs to load data into a clusters memory and clearing.It repeatedly this makes Spark well-suited to machine learning algorithms.

Spark became an Apache top-level project. It was previously an Apache Incubator project. It has received code contributions from large companies that use Spark the companies include Yahoo and Intel .over individual developers had contributed code to Spark representing different companies .the software is written in scholar Java and Python language.It is available for operating systems Linux Mac operating system, and Windows Spark is available for use an under Apache License to the official website is Spark doctor patchy .org

Tags: Apache Spark and Scala Training