Review Midterm Concepts Exam

Unit 1

Differences between association and prediction
What is supervised vs unsupervised machine learning and examples of each
What is regression vs. classification and examples of each
What is reducible vs. irreducible error and what factors contribute to each
What is the difference between predictors vs. features
What is a model configuration and what are the components/dimensions across which model configuratins vary
What is bias, variance, and the bias-variance tradeoff
What is overfitting
What factors affect the bias and variance of a model
What are pros/cons of model flexibility
What are pros/cons of model interpretability
Why do evaluate models using error in a held-out (validation or test) set?
How is p-hacking related to overfitting?

What is Exploratory data analysis and why is it important?
What are the stages of analysis
What is data leakage and examples of it. How do we prevent it?
What can you do and not due with training, validation, and test sets to prevent data leakage
What are typical visualizations for EDA depending on the measurement of the features/outcome
What are typical summary statistics for EDA depending on the measurement of the features/outcome

What are examples of performance metrics that can be used for regression models?
What is the general linear model?
- How does it work (how are parameters estimated)
- What assumptions does it make and what consequences for violating those assumptions?
- What is it good for, what is it less good for?
- What transformations and other feature engineering steps are often useful for GLM
How does KNN work
- What are its assumptions and requirements
- How does it make predictions?
- What does K affect and why would you use higher or lower values
- How do you calculate distance
- What transformations and other feature engineering steps are often useful for KNN
Compare the strengths and weaknesses of GLM vs. KNN

What is the Bayes classifier?
How do we use probability to make class predictions
What is the error rate of the Bayes classifier?
What is probability, odds, and odds ratios in classification
What is logistic regression?
- How does it make predictions?
- What decision boundaries does it support?
How is KNN adapted for classification and how does it make predictions
- What are its assumptions and requirements
- What decision boundaries does it support
- What transformations and other feature engineering steps are often useful for KNN
How does Linear discriminant analysis work
- What are its assumptions and requirements
- What decision boundaries does it support
- What transformations and other feature engineering steps are often useful for LDA
How Does Quadratic discriminant analysis work
- What are its assumptions and requirements
- What decision boundaries does it support
- What transformations and other feature engineering steps are often useful for QDA
What are the relative costs and benefits of these different statistical algorithms

What is bias vs. variance wrt model performance estimates
- How is this different from bias vs. variable of model itself
- What factors affect model bias/variance
- What factors affect bias and variance of performance estimate
What do we need training, validation and test sets and what do we use each for?
What are the important/common types of resampling and how do you do each of them?
- Validation set approach
- Leave One Out CV
- K-Fold and Repeated K-Fold
- Bootstrap resampling
How do these procedures compare with respect to
- bias of performance estimate
- variance of performance estimate
- computational cost
When/why do you need to do grouped resampling (e.g. Grouped K-fold)
How does varying k in k-fold affect bias and variance of performance estimate?
What is optimization bias and how do we prevent it?

What are the models that use subsetting approaches: Forward, Backward, Best Subset (covered in reading only)
- What are their pros/cons and when can they not be used
Cost and Loss functions
- What are they and how are they used
- What are the specific formulas for linear model, logistic regression, and variants of glmnet (ridge, LASSO, full elasticnet)
What is regularization
- What are its benefits?
- What are its costs?
How does lambda affect bias-variance trade-off in glmnet
What does alpha do?
Feature engineering approaches for dimensionality reduction: PCA (covered in reading only; and see appendix)
Other algorithms that do feature selection/dimensionality reduction: PCR and PLS (covered in reading only)
Contrasts of PCA, PCR, PLS, and glmnet/LASSO for dimensionality reduction (covered in reading only)