Course Materials
Unit 1: Overview
Reading
- Yarkoni and Westfall (2017) paper
- James et al. (2023) Chapter 2, pp 15 - 42
Slide decks
Videos
Lecture 1: An Introductory Framework ~ 9 mins
Lecture 3: Key Terminology in Context ~ 11 mins
Lab Materials
- None this week
Application Assignment
- No assignment this week
Quiz
- Submit the unit quiz by 8 pm on Wednesday, January 22nd
Unit 2: Exploratory Data Analysis
Reading
[NOTE: These are short chapters. You are reading to understand the framework of visualizing data in R. Don’t feel like you have to memorize the details. These are reference materials that you can turn back to when you need to write code!]
- Wickham, Çetinkaya-Rundel, and Grolemund (2023) Chapter 1, Data Visualization
- Wickham, Çetinkaya-Rundel, and Grolemund (2023) Chapter 9, Layers
- Wickham, Çetinkaya-Rundel, and Grolemund (2023) Chapter 10, Exploratory Data Analysis
Slide decks
Videos
Lecture 1: Stages of Data Analysis and Model Development ~ 10 mins
Lecture 2: Best Practices and Other Recommendations ~ 27 mins
Lecture 3: EDA for Data Cleaning ~ 41 mins
Lecture 4: EDA for Modeling - Univariate ~ 24 mins
Lecture 5: EDA for Modeling - Bivariate ~ 20 mins
Lecture 6: Working with Recipes ~ 13 mins
Lab Materials (Zihan - Jan 28th)
Application Assignment
cleaning EDA: qmd
modeling EDA: qmd
solutions: cleaning EDA; modeling EDA
Submit the application assignment by 8 pm on Wednesday, January 29th.
Quiz
- Submit the unit quiz by 8 pm on Wednesday, January 29th.
Unit 3: Introduction to Regression Models
Reading
- James et al. (2023) Chapter 3, pp 59 - 109
Slide decks
Videos
Lecture 1: Overview ~ 13 mins
Lecture 6: Extension to Interactions and Non-Linear Effects ~ 11 mins
Lecture 7: Introduction to KNN ~ 9 mins
Lecture 8: The hyperparameter k ~ 13 mins
Lecture 10: KNN with Ames ~ 12 mins
Lab Materials
Application Assignment
Submit the application assignment by 8 pm on Wednesday, February 5th.
Quiz
- Submit the unit quiz by 8 pm on Wednesday, February 5th.
Unit 4: Introduction to Classification Models
Reading
- James et al. (2023) Chapter 4, pp 129 - 164
Slide decks
Videos
Lecture 1: The Bayes Classifier ~ 9 mins
Lecture 2: Conceptual Overview of Logistic Regression ~ 19 mins
Lecture 3: EDA with the Cars Dataset ~ 12 mins
Lecture 5: KNN with Cars Dataset ~ 19 mins
Lecture 7: Comparisons among Classifiers ~ 11 mins
Lab Materials
Application Assignment
shells: cleaning EDA qmd; rda qmd; knn qmd
solution: modeling EDA; rda; knn
Submit the application assignment by 8 pm on Wednesday, February 12th.
Quiz
- Submit the unit quiz by 8 pm on Wednesday, February 12th.
Unit 5: Resampling Methods for Model Selection and Evaluation
Reading
Kuhn and Johnson (2018) Chapter 4, pp 61 - 80
Supplemental: James et al. (2023) Chapter 5, pp 197 - 208 186
Slide decks
Videos
Lecture 2: Introduction to Resampling ~ 11 mins
Lecture 7: Bootstrap Resampling ~ 11 mins
Lecture 8: Using Resampling to Select Best Model Configurations ~ 17 mins
Lecture 9: Resampling for Both Model Selection and Evaluation ~ 11 mins
Lecture 10: Nested Resampling ~ 14 mins
Lab Materials
Application Assignment
Submit the application assignment by 8 pm on Wednesday, February 19th.
Quiz
- Submit the unit quiz by 8 pm on Wednesday, February 19th.
Unit 6: Regularization and Penalized Models
Reading
- James et al. (2023) Chapter 6, pp 225 - 267
Slide decks
Videos
Lecture 1: An Introduction to Penalized/Regularized Algorithms ~ 15 mins
[Lecture 2: Intuitions about Penalized Cost Functions and Regularization ~ 11 mins
Lecture 3: Ridge Regression ~ 9 mins
Lecture 4: LASSO ~ 8 mins
Lecture 5: The Elastic net ~ 4 mins
Lecture 6: Emprical Example - Many good predictors ~ 23 mins
Lecture 7: Emprical Example - Good and zero predictors ~ 9 mins
Lecture 8: Emprical Example - LASSO for covariate selection ~ 8 mins
Lab Materials
Application Assignment
Submit the application assignment by 8 pm on Wednesday, February 26th.
Quiz
- Submit the unit quiz by 8 pm on Wednesday, February 26th.
Unit 8: Advanced Performance Metrics
Reading
- Kuhn and Johnson (2018) Chapter 11, pp 247-266
- Kuhn and Johnson (2018) Chapter 16, pp 419-435
- Wyant et al, in press
Slide decks
Videos
Lecture 1: Unit Introduction ~ 15 mins
Lecture 4: The Receiver Operating Characteristic (ROC) Curve ~ 25 mins
Lecture 5: Selecting Model Configurations with Other Metrics ~ 10 mins
Lecture 6: Addressing Class Imbalance ~ 24 mins
Lab Materials
Application Assignment
Quiz
- Submit the unit quiz by 8 pm on Wednesday, March 12th.
Unit 9: Decision Trees, Bagging, and Random Forest
Reading
- James et al. (2023) Chapter 8, Tree Based Methods; pp 327 - 352
In addition, much of the content from this unit has been drawn from four chapters in a book called Hands On Machine Learning In R. It is a great book and I used it heavily (and at times verbatim) b/c it is quite clear in its coverage of these algorithms. If you want more depth, you might read chapters 9-12 from this book as a supplement to this unit in our course.
Slide decks - Lecture - Discussion
Videos
Lecture 1: Decision Trees ~ 30 mins
Lecture 2: Decision Trees in Ames ~ 20 mins
Lecture 3: Bagged Treesi ~ 10 mins
Lecture 4: Bagged Trees in Ames ~ 6 mins
Lecture 5: Random Forest ~ 16 mins
Lab Materials
Application Assignment
Quiz
- Submit the unit quiz by 8 pm on Wednesday, March 19th.
Unit 10: Neural Networks
Reading
Slide decks
Videos
Lecture 1: But what is a Neural Network? ~ 19 mins
Lecture 2: Gradient descent, how neural networks learn ~ 21 mins
Lecture 6: Fitting neural networks in tidymodels with Keras ~ 18 mins
Lecture 10: Selecting model configurations and final remarks ~ 8 mins
Lab Materials
Application Assignment
Submit the application assignment here by noon on Friday, April 4th
Quiz
Complete the unit quiz by 8 pm on Wednesday, April 2rd
Unit 11: Explanatory Approaches
Reading
- Benavoli et al. (2017) paper: Read pages 1-9 that describe the correlated t-test and its limitations.
- Kruschke (2018) paper: Describes Bayesian estimation and the ROPE (generally, not in the context of machine learning and model comparisons)
And these chapters in the book Interpretable Machine Learning. They are all short!
- Molnar (2023) Chapter 2 - Interpretability
- Molnar (2023) Chapter 3 - Goals of Interpretability
- Molnar (2023) Chapter 4 - Methods Overview
- Molnar (2023) Chapter 17 - Shapley Values:
- Molnar (2023) Chapter 18 - SHAP:
- Molnar (2023) Chapter 19 - Partial Dependence Plots
- Molnar (2023) Chapter 20 - Accumulated Local Effects (ALE)
- Molnar (2023) Chapter 21 - Feature Interactions
- Molnar (2023) Chapter 23 - Permutation Feature Importance
Slide decks
Videos
Introduction to Model Comparisons ~ 6 mins
An Empirical Example of Feature Ablation ~ 13 mins
The Nadeau & Bengio Correlated t-test for Model Comparisons ~ 9 mins
Introduction to Feature Importance and the DALEX package ~ 11 mins
Permutation Feature Importance ~ 7 mins
SHAP Feature Importance ~ 14 mins
Visual Approaches to Understand Models ~ 11 mins
Note the lab record of this week was mistakenly limiated to the speaker view not the screen until about 11’. But the contents are all in the lab html. Keras demo and early stop usage starts from around 45’.
Lab Materials
Application Assignment
Submit the application assignment here by noon on Friday, April 11th
Quiz
Complete the unit quiz by 8 pm on Wednesday, April 9th
Unit 12: NLP
Reading
- Hvitfeldt and Silge (2022) Chapter 2: Tokenization
- Hvitfeldt and Silge (2022) Chapter 5: Word Embeddings
NOTES: Please read the above chapters more with an eye toward concepts and issues rather than code. I will demonstrate a minimum set of functions to accomplish the NLP modeling tasks for this unit.
Also know that the entire Hvitfeldt and Silge (2022, book) is really mandatory reading. I would also strongly recommend this entire Silge and Robinson (2017) book. Both will be important references at a minimum.
Slide decks
Videos
Lecture 1: General Text (Pre-) Processing - the stringr package ~ 9 mins
Lecture 2: General Text (Pre-) Processing - regular expressions ~ 13 mins
Lecture 3: The IMDB Reviews Dataset ~ 6 mins
Lecture 4: Tokenization- Part 1 ~ 27 mins
Lecture 5: Tokenization- Part 2 ~ 13 mins
Lecture 6: Stopwords ~ 12 mins
Lecture 7: Stemming ~12 mins
Lecture 8: Bag of Words ~19 mins
Lecture 9: NLP in Action - Part 1 ~ 17 mins
Lecture 10: NLP in Action - Part 2 ~ 20 mins
Lab Materials
Application Assignment
Submit the application assignment here by noon on Friday, April 18th
Quiz
Complete the unit quiz by 8 pm on Wednesday, April 16th