Teaching

IST 718: Big Data Analytics (formerly Advanced Information Analytics)

Goal
This course is a broad introduction to modern techniques in data science including elastic net regularized regression, random forest, gradient boosting, and deep learning. It emphasizes a statistical learning point of view, and a careful examination of generalization error, model interpretability, feature engineering, and bias-variance tradeoff.

Tools
The tool of choice is Apache Spark on Hadoop’s HDFS. We use an environment based on Jupyter Notebook and Python, deployed with Kubernetes.