Homework 3 - Bayesian Statistics, Logistic Regression, Generative Models, and K-Nearest Neighbors

Due Friday, April 4, 11:59 PM

Please submit your work in one PDF file to D2L > Assessments > Dropbox. Multiple files or a file that is not in pdf format are not allowed.
Any relevant code should be attached.
Problems started with (MSSC PhD) are required for CMPS PhD students, and optional for other students for extra credits.
Read ISL Chapter 4.

Homework Questions

Required

ISL Sec. 4.8: 2
ISL Sec. 4.8: 3
ISL Sec. 4.8: 5
ISL Sec. 4.8: 13
MNIST Handwritten Digits Image
1. Load the prepared MNIST data mnist.csv. Print some images.
2. Use the first 1000 observations as the training data and the second half as the test data.
3. Training with KNN and predicting on the test data with the best \(K\) selected from the training.
  - Calculate the test error rate.
  - Generate the confusion matrix.
4. Training with multinomial logistic regression and predicting on the test data.
  - Calculate the test error rate.
  - Generate the confusion matrix.
(MSSC PhD) KNN Curse of Dimensionality
1. Generate Generate the covariates \(x_1, x_2, \dots, x_5\) of \(n = 1000\) training data from independent standard normal distribution. Then, generate \(Y\) from \[Y = X_1 + 0.5 X_2 - X_3 + \epsilon,\] where \(\epsilon \sim N(0, 1).\)
2. Use the first 500 observations as the training data and the rest as the test data. Fit KNN regression, and report the test MSE of \(y\) with the optimal \(K\).
3. Add additional 95 noisy predictors as follows.
  - Case 1: \(x_6, x_7, \dots, x_{100} \overset{\mathrm{iid}}{\sim} N(0, 1)\)
  - Case 2: \(XA\) where \(X_{1000 \times 5} = [x_1 \cdots x_5]\) and \(A_{5 \times 95}\) having entries from iid uniform(0, 1).
4. Fit KNN regression in both cases (with the total of 100 covariates) and select the best \(K\) value.
5. For both cases, what is the best K and the best mean squared error for prediction? Discuss the effect of adding 95 (unnecessary) covariates.

Do one of the followings

Watch the talk All About that Bayes: Probability, Statistics, and the Quest to Quantify Uncertainty by Dr. Kristin Lennox. In 250 words, summarize your thoughts and what you learned from the talk.

In 250 words, summarize your thoughts and what you learned from the deep learning workshop.