Big Data Analytics with Apache Spark

Course level: Advanced
Share:

There are no active Semester Schedule for this course Pre-registar

Description

Apache Spark is an open- source parallel-processing framework which is used for large-scale data analytics applications on clustered computers. This course helps you to develop and deploy large- scale data analytics projects.

COURSE OBJECTIVES

•To understand Big Data Analytics and Apache Spark ecosystem

•To ingest data from all popular data hosting platforms, including HDFS, Hive, JSON, and S3

•To learn how to work with DataFrames and Spark SQL

•To build machine learning models with MLlib

•To process streaming data with Spark Streaming

•To learn how to perform graph computation with GraphX

COURSE SYLLABUS

•Introduction to Apache Spark ecosystem

•Install and set up Apache Spark

•Work with DataFrames and Spark SQL

•Cleaning and wrangling Big Data

•Aggregating and summarizing Data into useful reports

•Preprocessing and Feature Engineering

•Build Machine Learning applications with MLlib

•Build a Recommendation system

•Optimizing the design of jobs by avoiding expensive

operations

•Build scalable fault-tolerant streaming applications with Spark Streaming

•Graph-parallel computation with GraphX

Registration fee :

€ 790

Target audience

Data Engineers
Data Scientists
Database Administrators
IT staff
Computer Science and IT Students

Requirements

Basic understanding of SQL Basic knowledge on Python programming and Machine Learning concepts