Big Data Analytics with Apache Spark
Description
Apache Spark is an open- source parallel-processing framework which is used for large-scale data analytics applications on clustered computers.
This course helps you to develop and deploy large- scale data analytics projects.
COURSE OBJECTIVES
•To understand Big Data Analytics and Apache Spark ecosystem
•To ingest data from all popular data hosting platforms, including HDFS, Hive, JSON, and S3
•To learn how to work with DataFrames and Spark SQL
•To build machine learning models with MLlib
•To process streaming data with Spark Streaming
•To learn how to perform graph computation with GraphX
COURSE SYLLABUS
•Introduction to Apache Spark ecosystem
•Install and set up Apache Spark
•Work with DataFrames and Spark SQL
•Cleaning and wrangling Big Data
•Aggregating and summarizing Data into useful reports
•Preprocessing and Feature Engineering
•Build Machine Learning applications with MLlib
•Build a Recommendation system
•Optimizing the design of jobs by avoiding expensive
operations
•Build scalable fault-tolerant streaming applications with Spark Streaming
•Graph-parallel computation with GraphX