Apache Spark Training (3 days)

  • Set up a 2-node Spark cluster on Linux
  • Ingest data from HDFS
  • Implement one batch-processing and one streaming use case in Java / Scala
  • Learn to use Spark SQL
  • Write a simple program using GraphX
  • Review Monitoring and Metrics

Machine Learning Lab (3 days)

  • Introduction to Spark MLib
  • Deep-dive into k-means Clustering, Logistic Regression and Collaborative Filtering
  • Choice of Mini Project ( using Spark ) :
    • Recommendations using Collaborative Filtering, or
    • Customer Churn Analysis using Logistic Regression

