Big Data Analytics with Apache Spark

  • Course level: Advanced  
There are no active Semester Schedule for this course   Pre-registar

Description

Apache Spark is an open- source parallel-processing framework which is used for large-scale data analytics applications on clustered computers. This course helps you to develop and deploy large- scale data analytics projects.

COURSE OBJECTIVES

To understand Big Data Analytics and Apache Spark  ecosystem
To ingest data from all popular data hosting platforms,  including HDFS, Hive, JSON, and S3
To learn how to work with DataFrames and Spark SQL
To build machine learning models with MLlib
To process streaming data with Spark Streaming
To learn how to perform graph computation with GraphX

COURSE SYLLABUS

Introduction to Apache Spark ecosystem
Install and set up Apache Spark
Work with DataFrames and Spark SQL
Cleaning and wrangling Big Data
Aggregating and summarizing Data into useful reports
Preprocessing and Feature Engineering
Build Machine Learning applications with MLlib
Build a Recommendation system
Optimizing the design of jobs by avoiding expensive

operations

Build scalable fault-tolerant streaming applications with  Spark Streaming
Graph-parallel computation with GraphX

Registration fee :
€ 790

Target audience

  • Data Engineers
  • Data Scientists
  • Database Administrators
  • IT staff
  • Computer Science and IT Students

Requirements

  • Basic understanding of SQL Basic knowledge on Python programming and Machine Learning concepts