Review Final Concepts Exam

Unit 8: Advanced Performance Metrics

Understand costs and benefits of accuracy
What is a confusion matrix? How to interpret rows and columns?
Understand costs and benefits of other performance metrics based on confusion matrix
- sensitivity
- specificity
- positive predictive value
- negative predictive value
- balanced accuracy
- F score
- Kappa
What is an ROC curve, what does it tell you? What is the role of a threshold in the curve?
What is the area under the ROC curve? How can it be interpreted and used as a performance metrix
What is unbalanced data? What problems does it cause
What are solutions to problems with unbalanced data and how do they work?
- Selection of performance metric
- Selection of classification threshold
- Sampling and resampling approaches

How do we use feature ablation to statistically compare model configurations
How do we use Bayesian estimation for model comparisons.
How are p-values and posterior probabilities different
What is a Bayesian ROPE
What are examples of model specific approaches for feature importance
What are examples of model agnostic approaches for feature importance
How does permutation feature importance work (how are scores calculated) and what does it tell us?
What are Shapley values? What are the benefits of this approach?
What is the difference between local and global feature importance?
What Partial Dependence plots and Accumulated Local Effects (ALE) plots. WHat do they tell us? What are the advantages and disadvantages of each?

What are tokens, what types of tokens are used regularly, and how do you create them?
What are n-grams
What is stemming and lemmatization? How do they work? What are the advantages and disadvantages of each?
What are stop-words? How do you identify them? What are the advantages and disadvantages of removing stop-words?
How can you use one-hot encoding in the bag of words approach to create features from words (or other tokens)?
What are the disadvantages of using one-hot encoding for words?
What is a word embedding? How do you create them using Word2Vec (skip-gram and CBOW)? What are the advantages and disadvantages of word embeddings?
What is a sentiment analysis and how might you do it?

What are error analyses?
- How do you do it?
- What is an eyeball sample?
- When is it useful?
- What are the risks of error analysis?
- How can you do error analysis without an eyeball sample?
- How can you do error analysis for regression?
What can you learn from comparing training error to validation error?
- Can you identify situations that indicate bias vs. variance vs. both sources of error
- What is optimal error and how might you calculate it?
What are learning curves?
- How do you calculate them?
- What do they tell you?
- Can you use them to identify more N will improve performance?
- Can you use the to tell if better features or more complicated model will improve perforance