Course Materials
Unit 1: Overview
Reading
- Yarkoni and Westfall (2017) paper
- James et al. (2023) Chapter 2, pp 15 - 42
Slide decks
Videos
Lecture 1: An Introductory Framework ~ 9 mins
Lecture 3: Key Terminology in Context ~ 11 mins
Lab Materials
- None this week
Quiz - Submit the unit quiz by 8 pm on Wednesday, January 21nd
Application Assignment
- No assignment this week
Unit 2: Exploratory Data Analysis
Reading
[NOTE: These are short chapters. You are reading to understand the framework of visualizing data in R. Don’t feel like you have to memorize the details. These are reference materials that you can turn back to when you need to write code!]
- Wickham, Çetinkaya-Rundel, and Grolemund (2023) Chapter 1, Data Visualization
- Wickham, Çetinkaya-Rundel, and Grolemund (2023) Chapter 9, Layers
- Wickham, Çetinkaya-Rundel, and Grolemund (2023) Chapter 10, Exploratory Data Analysis
Slide decks
Videos
Lecture 1: Stages of Data Analysis and Model Development ~ 10 mins
Lecture 2: Best Practices and Other Recommendations ~ 27 mins
Lecture 3: EDA for Data Cleaning ~ 41 mins
Lecture 4: EDA for Modeling - Univariate ~ 24 mins
Lecture 5: EDA for Modeling - Bivariate ~ 20 mins
Lecture 6: Working with Recipes ~ 13 mins
Lab Materials
Quiz
- Submit the unit quiz by 8 pm on Wednesday, January 28th.
Application Assignment
cleaning EDA: qmd
modeling EDA: qmd
solutions: cleaning EDA; modeling EDA
Submit the application assignment by 8 pm on Friday, January 30th.
Unit 3: Introduction to Regression Models
Reading
- James et al. (2023) Chapter 3, pp 59 - 109
Slide decks
Videos
Lecture 1: Overview ~ 13 mins
Lecture 6: Extension to Interactions and Non-Linear Effects ~ 11 mins
Lecture 7: Introduction to KNN ~ 9 mins
Lecture 8: The hyperparameter k ~ 13 mins
Lecture 10: KNN with Ames ~ 12 mins
Lab Materials
Quiz
- Submit the unit quiz by 8 pm on Wednesday, February 4th.
Application Assignment
Submit the application assignment by 8 pm on Friday, February 6th.
Unit 4: Introduction to Classification Models
Reading
- James et al. (2023) Chapter 4, pp 129 - 164
Slide decks
Videos
Lecture 1: The Bayes Classifier ~ 9 mins
Lecture 2: Conceptual Overview of Logistic Regression ~ 19 mins
Lecture 3: EDA with the Cars Dataset ~ 12 mins
Lecture 5: KNN with Cars Dataset ~ 19 mins
Lecture 7: Comparisons among Classifiers ~ 11 mins
Lab Materials
Quiz
- Submit the unit quiz by 8 pm on Wednesday, February 11th.
Application Assignment
shells: cleaning EDA qmd; rda qmd; knn qmd
solution: modeling EDA; rda; knn
Submit the application assignment by 8 pm on Friday, February 13th.
Unit 5: Resampling Methods for Model Selection and Evaluation
Reading
Kuhn and Johnson (2018) Chapter 4, pp 61 - 80
Supplemental: James et al. (2023) Chapter 5, pp 197 - 208 186
Slide decks
Videos
Lecture 2: Introduction to Resampling ~ 11 mins
Lecture 7: Bootstrap Resampling ~ 11 mins
Lecture 8: Using Resampling to Select Best Model Configurations ~ 17 mins
Lecture 9: Resampling for Both Model Selection and Evaluation ~ 11 mins
Lecture 10: Nested Resampling ~ 14 mins
Lab Materials
Quiz
- Submit the unit quiz by 8 pm on Wednesday, February 18th.
Application Assignment
Submit the application assignment by 8 pm on Friday, February 20th.
Unit 6: Regularization and Penalized Models
Reading
- James et al. (2023) Chapter 6, pp 225 - 267
Slide decks
Videos
Lecture 1: An Introduction to Penalized/Regularized Algorithms ~ 15 mins
[Lecture 2: Intuitions about Penalized Cost Functions and Regularization ~ 11 mins
Lecture 3: Ridge Regression ~ 9 mins
Lecture 4: LASSO ~ 8 mins
Lecture 5: The Elastic net ~ 4 mins
Lecture 6: Emprical Example - Many good predictors ~ 23 mins
Lecture 7: Emprical Example - Good and zero predictors ~ 9 mins
Lecture 8: Emprical Example - LASSO for covariate selection ~ 8 mins
Lab Materials
Quiz
- Submit the unit quiz by 8 pm on Wednesday, February 25th.
Application Assignment
Submit the application assignment by 8 pm on Friday, February 27th.
Unit 8: Advanced Performance Metrics
Reading
- Kuhn and Johnson (2018) Chapter 11, pp 247-266
- Kuhn and Johnson (2018) Chapter 16, pp 419-435
- Wyant et al, in press
Slide decks
Videos
Lecture 1: Unit Introduction ~ 15 mins
Lecture 4: The Receiver Operating Characteristic (ROC) Curve ~ 25 mins
Lecture 5: Selecting Model Configurations with Other Metrics ~ 10 mins
Lecture 6: Addressing Class Imbalance ~ 24 mins
Lab Materials
Quiz
- Submit the unit quiz by 8 pm on Wednesday, March 11th.
Application Assignment
Unit 9: Decision Trees, Bagging, and Random Forest
Reading
- James et al. (2023) Chapter 8, Tree Based Methods; pp 327 - 352
In addition, much of the content from this unit has been drawn from four chapters in a book called Hands On Machine Learning In R. It is a great book and I used it heavily (and at times verbatim) b/c it is quite clear in its coverage of these algorithms. If you want more depth, you might read chapters 9-12 from this book as a supplement to this unit in our course.
Slide decks - Lecture - Discussion
Videos
Lecture 1: Decision Trees ~ 30 mins
Lecture 2: Decision Trees in Ames ~ 20 mins
Lecture 3: Bagged Treesi ~ 10 mins
Lecture 4: Bagged Trees in Ames ~ 6 mins
Lecture 5: Random Forest ~ 16 mins
Lab Materials
Quiz
- Submit the unit quiz by 8 pm on Wednesday, March 18th.
Application Assignment
Unit 10: Neural Networks
Reading
Slide decks
Videos
Lecture 1: But what is a Neural Network? ~ 19 mins
Lecture 2: Gradient descent, how neural networks learn ~ 21 mins
Lecture 6: Fitting neural networks in tidymodels with Keras ~ 18 mins
Lecture 10: Selecting model configurations and final remarks ~ 8 mins
Lab Materials
Quiz
Submit the unit quiz by 8 pm on Wednesday, March 25th
Application Assignment
Submit the application assignment here by 8pm on Friday, March 27th
Unit 11: Explanatory Approaches
Reading
- Benavoli et al. (2017) paper: Read pages 1-9 that describe the correlated t-test and its limitations.
- Kruschke (2018) paper: Describes Bayesian estimation and the ROPE (generally, not in the context of machine learning and model comparisons)
And these chapters in the book Interpretable Machine Learning. They are all short!
- Molnar (2023) Chapter 2 - Interpretability
- Molnar (2023) Chapter 3 - Goals of Interpretability
- Molnar (2023) Chapter 4 - Methods Overview
- Molnar (2023) Chapter 17 - Shapley Values:
- Molnar (2023) Chapter 18 - SHAP:
- Molnar (2023) Chapter 19 - Partial Dependence Plots
- Molnar (2023) Chapter 20 - Accumulated Local Effects (ALE)
- Molnar (2023) Chapter 21 - Feature Interactions
- Molnar (2023) Chapter 23 - Permutation Feature Importance
Slide decks
Videos
Introduction to Model Comparisons ~ 6 mins
An Empirical Example of Feature Ablation ~ 13 mins
The Nadeau & Bengio Correlated t-test for Model Comparisons ~ 9 mins
Introduction to Feature Importance and the DALEX package ~ 11 mins
Permutation Feature Importance ~ 7 mins
SHAP Feature Importance ~ 14 mins
Visual Approaches to Understand Models ~ 11 mins
Lab Materials
Quiz
Submit the unit quiz by 8 pm on Wednesday, April 8th
Application Assignment
Submit the application assignment here by 8pm on Friday, April 10th
Unit 12: NLP
Reading
- Hvitfeldt and Silge (2022) Chapter 2: Tokenization
- Hvitfeldt and Silge (2022) Chapter 5: Word Embeddings
NOTES: Please read the above chapters more with an eye toward concepts and issues rather than code. I will demonstrate a minimum set of functions to accomplish the NLP modeling tasks for this unit.
Also know that the entire Hvitfeldt and Silge (2022, book) is really mandatory reading. I would also strongly recommend this entire Silge and Robinson (2017) book. Both will be important references at a minimum.
Slide decks
Videos
Lecture 1: General Text (Pre-) Processing - the stringr package ~ 9 mins
Lecture 2: General Text (Pre-) Processing - regular expressions ~ 13 mins
Lecture 3: The IMDB Reviews Dataset ~ 6 mins
Lecture 4: Tokenization- Part 1 ~ 27 mins
Lecture 5: Tokenization- Part 2 ~ 13 mins
Lecture 6: Stopwords ~ 12 mins
Lecture 7: Stemming ~12 mins
Lecture 8: Bag of Words ~19 mins
Lecture 9: NLP in Action - Part 1 ~ 17 mins
Lecture 10: NLP in Action - Part 2 ~ 20 mins
Lab Materials
Quiz
Submit the unit quiz by 8 pm on Wednesday, April 15th
Application Assignment
Submit the application assignment here by 8pm on Friday, April 17th
Unit 13: Applications
Reading
Slide decks
Videos
No lectures this week. Only lab and discussion section.
Lab Materials
Quiz
Submit the unit quiz by 8 pm on Wednesday, April 22nd!
Application Assignment
No assignment this week!
Unit 14: Ethics
Reading
The readings this week will come from O’Neil (2016); We will read the introduction, chapters 1, 3, 5, and the conclusion and afterword sections. A pdf of the book will be shared directly with you.
We will also read this article on emerging methods and tools for assessing model fairness.
Slide decks
Videos
No lectures this week. Only discussion section.
Quiz
Submit the unit quiz by 8 pm on Wednesday, April 29th
Application Assignment
No assignment this week!