Menu

Big Data and Hadoop

Level
All Levels
Duration
45 hours
Course Fee
₹15000
*Inclusive of GST   

This Big Data Analytics Training Course is curated by Hadoop industry experts, and it covers in-depth knowledge on Big Data and Hadoop Ecosystem tools such as HDFS, YARN, MapReduce, Hive, Pig, HBase, Spark, Oozie, Flume and Sqoop.

Training Type
Classroom Online Corporate
Batch Timings

For the latest training schedule, please check the Schedules.

Weekdays
  • Early Morning
  • Morning
  • Afternoon
  • Evening
  • Fastrack
Weekdays
  • Morning
  • Afternoon
  • Evening
  • Sat / Sun
  • Sunday Only

Training is available in small groups as well as on one-to-one basis. Get in touch.

Big Data and Hadoop

Level
All Levels
Duration
45 hrs.
Course Fee
₹15000

This Big Data Analytics Training Course is curated by Hadoop industry experts, and it covers in-depth knowledge on Big Data and Hadoop Ecosystem tools such as HDFS, YARN, MapReduce, Hive, Pig, HBase, Spark, Oozie, Flume and Sqoop.

Training Type
Classroom Online Corporate
Batch Timings

For the latest training schedule, please check the Schedules.

Weekdays
  • Early Morning
  • Morning
  • Afternoon
  • Evening
  • Fastrack
Weekdays
  • Morning
  • Afternoon
  • Evening
  • Sat / Sun
  • Sunday Only

Training is available in small groups as well as on one-to-one basis. Get in touch.

Course Introduction

This is 45 day or 6 weeks (Sunday/Saturday) 50 hours instructor led BIG DATA course is designed to give you an in-depth knowledge of the Big Data framework using Hadoop and Spark, including HDFS, YARN, and MapReduce. You will learn to use Pig, Hive, and Impala to process and analyze large datasets stored in the HDFS and use Sqoop and Flume for data ingestion with our big data training. The key to a high success rate is based on the program’s objectives as follows:

  • Course contents are based on BIG DATA and Hadoop’s guide lines
  • Dedicated Monitoring to evaluate and report candidates progress
  • Extensive hands-on lab exercises
  • Regular evaluation.
  • Industry acclaimed and experienced and certified instructors
Course Highlights

Big Data Analytics is widely used to analyze large volumes of data. The growing need for professionals equipped with the knowledge of Big Data and Hadoop has increased opportunities for those who want to make a career in this field. The world is getting increasingly digital, and this means big data is here to stay. In fact, the importance of big data and data analytics is going to continue growing in the coming years Knowing the basics of Big Data and Hadoop will make it easier for such professionals to pursue advanced level courses in this subject and acquire skills to become experts in Big Data analytics. Choosing a career in the field of big data and analytics might just be the type of role that you have been trying to find to meet your career expectations.

Course Objectives

Upon completing the course, students will be able to meet the following objectives:

  • To store, manage, and analyze unstructured data
  • Select the correct big data stores for disparate data sets
  • Process large data sets using Hadoop to extract value
  • Query large data sets in near real time with Pig and Hive
  • Build Complete understanding of the Big Data Analytics Concepts.
  • Understand the different Data Processing skills.
  • Real-Time Analysis on Large Data.
  • Plan and implement a big data strategy for your organization
Course Topics

Understanding Big Data and Hadoop:

  • Introduction to Big Data & Big Data Challenges
  • Limitations & Solutions of Big Data Architecture
  • Hadoop & its Features
  • Hadoop Ecosystem
  • Hadoop 2.x Core Components
  • Hadoop Storage: HDFS (Hadoop Distributed File System)
  • Hadoop Processing: MapReduce Framework
  • Different Hadoop Distributions

Hadoop Architecture and HDFS:

  • Hadoop 2.x Cluster Architecture
  • Federation and High Availability Architecture
  • Typical Production Hadoop Cluster
  • Hadoop Cluster Modes
  • Common Hadoop Shell Commands
  • Hadoop 2.x Configuration Files
  • Single Node Cluster & Multi-Node Cluster set up
  • Basic Hadoop Administration

Hadoop MapReduce Framework:

  • Traditional way vs MapReduce way
  • Why MapReduce
  • YARN Components
  • YARN Architecture
  • YARN MapReduce Application Execution Flow
  • YARN Workflow
  • Anatomy of MapReduce Program
  • Input Splits, Relation between Input Splits and HDFS Blocks
  • MapReduce: Combiner & Partitioner
  • Demo of Health Care Dataset
  • Demo of Weather Dataset

Advanced Hadoop and MapReduce:

  • Counters
  • Distributed Cache
  • MRunit
  • Reduce Join
  • Custom Input Format
  • Sequence Input Format
  • XML file Parsing using MapReduce

Apache Pig:

  • Introduction to Apache Pig
  • MapReduce vs Pig
  • Pig Components & Pig Execution
  • Pig Data Types & Data Models in Pig
  • Pig Latin Programs
  • Shell and Utility Commands
  • Pig UDF & Pig Streaming
  • Testing Pig scripts with Punit
  • Aviation use-case in PIG
  • Pig Demo of Healthcare Dataset

Apache Hive:

  • Introduction to Apache Hive
  • Hive vs Pig
  • Hive Architecture and Components
  • Hive Metastore
  • Limitations of Hive
  • Comparison with Traditional Database
  • Hive Data Types and Data Models
  • Hive Partition
  • Hive Bucketing
  • Hive Tables (Managed Tables and External Tables)
  • Importing Data
  • Querying Data & Managing Outputs
  • Hive Script & Hive UDF
  • Retail use case in Hive

Advanced Apache Hive and HBase:

  • Hive QL: Joining Tables, Dynamic Partitioning
  • Custom MapReduce Scripts
  • Hive Indexes and views
  • Hive Query Optimizers
  • Hive Thrift Server
  • Hive UDF
  • Apache HBase: Introduction to NoSQL Databases and HBase
  • HBase v/s RDBMS
  • HBase Components
  • HBase Architecture
  • HBase Run Modes
  • HBase Configuration
  • HBase Cluster Deployment

Advanced Apache HBase:

  • HBase Data Model
  • HBase Shell
  • HBase Client API
  • Hive Data Loading Techniques
  • Apache Zookeeper Introduction
  • ZooKeeper Data Model
  • Zookeeper Service
  • HBase Bulk Loading
  • Getting and Inserting Data
  • HBase Filters

Processing Distributed Data with Apache Spark:

  • What is Spark
  • Spark Ecosystem
  • Spark Components
  • What is Scala
  • Why Scala
  • SparkContext
  • Spark RDD

Oozie and Hadoop Project:

  • Oozie
  • Oozie Components
  • Oozie Workflow
  • Scheduling Jobs with Oozie Scheduler
  • Demo of Oozie Workflow
  • Oozie Coordinator
  • Oozie Commands
  • Oozie Web Console
  • Oozie for MapReduce
  • Combining flow of MapReduce Jobs
  • Hive in Oozie
  • Hadoop Project Demo
  • Hadoop Talend Integration
Lab Topics

Not Available


Virtual Classroom
  • Instructor led online training is an ideal vehicle for delivering training to individuals anywhere in the world at any time.
  • This innovative approach presents live content with instructor delivering the training online.
  • Candidates will be performing labs remotely on our labs on cloud in presence of an online instructor.
  • Rstforum uses microsoft lync engine to deliver instructor led online training.
  • Advances in computer network technology, improvements in bandwidth, interactions, chat and conferencing, and realtime audio and video offers unparalleled training opportunities.
  • Instructor led online training can helps today’s busy professionals to perform their jobs and upgrade knowledge by integrating self-paced instructor led online training in their daily routines.
Miscellaneous
  • Minimum batch size required for batch is 10 participants in the this course.
  • The RST Forum reserves the right to cancel/postpone the class.
  • Course schedule will be provided before commencement of the course.
  • Certificate of participation will be awarded to participants with a minimum 90% attendance.
  • All attendees are to observe the Copyright Law on intellectual properties such as software and courseware from respective vendors.
  • The RST Forum reserves the right to include external participants in the program either for the entire course or individual courses.
  • The RST Forum reserves the right to change/alter the sequence of courses. RST FORUM published Book would be given at 50% discounted rate to the forum students.