代做Introduction to Machine Learning M146 Spring Quarter 2025 Homework #1调试Python程序-留学生作业帮

代做Introduction to Machine Learning M146 Spring Quarter 2025 Homework #1调试Python程序

Introduction to Machine Learning M146

Spring Quarter 2025

Homework #1

Due: 22nd April 2025, Tuesday, before 11:59 pm

Problem 1 (Perceptron)

Suppose we have a training set with 8 samples, each sample has feature vector in R2:

#	1	2	3	4	5	6	7	8
X	[4,0]	[1,1]	[0,1]	[-2,-2]	[-2,1]	[1,0]	[5,2]	[3,0]
y	1	-1	-1	1	-1	1	-1	-1

We are going to implement the perceptron algorithm to train a linear classifier with 2 dimensional weight vector w ∈ R2 (no bias term). We start with initial weight vector as the first sample in our dataset, i.e. w1 = x1 . Note that: when wTx = 0, the algorithm predicts +1.

To simplify the calculation, you only need to test and possibly update each sample once in the given sequence. You can either implement the algorithm by hand or programming.

(a) Is the data linearly separable? Will our algorithm converge if we run it several times over the same sequence? Explain.

(b) Regardless of whether the dataset is linearly separable or not, calculate the updates of the weight vector on this sequence for one round over the entire dataset. Follow the order of the index for the samples and show your calculations.

(c) Provide closed-form functions for the perceptron, Voted perceptron, and Average perceptron, using the weight vector(s) derived in part (b).

(d) Using the functions derived in part (c), compare the errors between the perceptron, the Voted perceptron predictor, and the Average perceptron predictor across the entire dataset. For each point in the dataset, find the label assigned by each classifier and report the error over the dataset.

Problem 2 (Locally Weighted Linear Regression)

Consider a linear regression problem in which we want to “weight” different training instances differently because some of the instances are more important than others. Specifically, suppose we want to minimize

(1)

Here αn > 0. In class, we worked out what happens for the case where all the weights (the αn’s) are the same. In this problem, we will generalize some of those ideas to the weighted setting.

(a) Calculate the gradient by computing the partial derivatives of J with respect to each of the

parameters (w0, w1).

(b) Prove that Eq. (1) has a global optimal solution.

Problem 3 (Modified Logistic Regression with alternative labels)

In class, we have seen the logistic regression when labels are {0,1}. In this question, you will derive the logistic regression when labels are instead {-1,1}.

In class, we considered the dataset D = {(x1, y1), . . . , (xn, yn)} with n samples where xi ∈ Rd and labels yi ∈ {0, 1} for all i ∈ [n]. The prediction function hw(x) studied in class is given by

(2)

Moreover, the objective studied in class to minimize for logistic regression was:

P2.1: (3)

We want to modify this objective function so that is the activation function and the labels.

(a) Show that

tanhw(x) = 2hw′ (x) − 1, w′ = 2w.

(b) What are the asymptotic values of the function tanh w(x) as wTx → ∞ and wTx → −∞? Roughly draw the graph of this function with respect to w Tx. What is the decision criterion you can choose for predicting labels as −1 or 1?

(c) Using your answer in part (b), argue that we cannot directly replace h w(xi) with tanhw(x) in the optimization problem P2.1 (3).

(d) When labels are i ∈ {−1, 1} show, using your answer in part (a), that the optimization problem in P2.1 is equivalent to:

P2.2: (4)

(e) Compute the gradient of the loss function in P2.2 (4) for a single sample xi. Consider the two cases = 1 and = −1 separately

Problem 4 (Programming Exercise: Binary Classification)

In this exercise, you will work through a family of binary classifications. Our data consists of inputs xn ∈ R1×d and labels yn ∈ {−1, 1} for n ∈ {1,..., N}. We will work on a subset of the Fashion-MNIST dataset which focuses on classifying whether the image is for a Dress (y = 1) or a Shirt (y = −1). Your goal is to learn a classifier based on linear predictor h w(x) = wTx. Let

(5)

The main file is the Notebook Jupyter notebook.

(a) (Visualization): Visualize a sample of the training data. What is the dimensions of X train , and Xtest.

(b) (Perceptron): Implement Perceptron Algorithm to classify your training data. Let the maximum number of iterations of the Algorithm numiter = N (number of training samples). At each iteration, compute the percentage of misclassified points in the training dataset, and save it into a Loss hist array. Plot the history of the loss function (Loss hist). What is the final value of the loss function and the squared ℓ2 norm value of the weight ? Looking at the loss function, can you comment on whether the Perceptron algorithm converges?

(c) (Perceptron test error): Compute the percentage of misclassified points in the test data for the trained Perceptron.

(d) (Logistic Regression): In this part, we will implement the logistic regression for binary classification. Recall that logistic regression attempts to minimize the objective function

(6)

where xn = (1, xn), and 1A = 1 if A is true and 0 otherwise. Moreover, hw(xn) = wT xn. First, we will add an additional feature to each instance and set it to one. This is equivalent to adding an additional first column to X and setting it to all ones.

Modify the get features() in Logistic.py file to create a matrix X for logistic regression model.

(e) Complete predict() in Logistic.py file to predict y from X and w.

(f) Complete the function loss and grad() to compute the loss function and the gradient of the loss function with respect to w for a data set X and labels y at given weights w. Test your results by running the code in the main file Notebook.ipynb. If you implement everything correctly, you should get the loss function within 0.7 and squared ℓ 2 norm of the gradient around 1.8 × 105.

(g) Complete the function train LR() to train the logistic regression model for given learning rate η = 10-6 , batch size = 100, and number of iterations numiters = 5000. Plot the history of the loss function (Loss hist). What is the final value of the loss function and the squared ℓ2 norm value of the weight ?

(h) (Logistic Regression test error): Compute the percentage of misclassified points in the test data for the trained Logistic Regression.

(i) (Logistic Regression and Batch Size): Train the Logistic regression model with differ- ent batch size b ∈ {1, 50, 100, 200, 300}, learning rate η = 10-5, and number of iterations numiter = 6000/b. Train each model 50 or 100 times and average the test error for each value of batch size. Plot the test error as a function of the batch size. Which batch size gives the minimum test error?

Problem 5 (Programming Exercise: Polynomial Regression)

In this exercise, you will work through linear and polynomial regression. Our data consists of inputs xn ∈ R and outputs yn ∈ R, n ∈ {1,..., N}, which are related through a target function y = f(x). Your goal is to learn a linear predictor hw(x) that best approximates f(x).

code and data

• code : regression.py, Notebook .ipynb

• data : regression_train .csv, regression_test .csv

Visualization

As we learned last week, it is often useful to understand the data through visualizations. For this data set, you can use a scatter plot to visualize the data since it has only two properties to plot (x and y).

(a) Visualize the training and test data using the plot_data( . . .) function. What do you ob- serve? For example, can you make an educated guess on the effectiveness of linear regression in predicting the data?

Linear Regression

Recall that linear regression attempts to minimize the objective function

In this problem, we will use the matrix-vector form where

and each instance xn = (1, xn,1,..., xn,D)T .

In this instance, the number of input features D = 1.

Rather than working with this fully generalized, multivariate case, let us start by considering a simple linear regression model:

hw(x) = wTx = w0 + w1x1

regression.py contains the skeleton code for the class Regression. Objects of this class can be instantiated as model = Regression (m) where m is the degree of the polynomial feature vector where the feature vector for instance n, Setting m = 1 instantiates an object where the feature vector for instance n, (1, xn,1)T .

(b) Note that to take into account the intercept term (w0), we can add an additional “feature” to each instance and set it to one, e.g. xi,0 = 1. This is equivalent to adding an additional first column to X and setting it to all ones. Modify get_poly_features() in Regression.py for the case m = 1 to create the matrix X for a simple linear model.

(c) Before tackling the harder problem of training the regression model, complete predict() in Regression.py to predict y from X and w.

(d) One way to solve linear regression is through gradient descent (GD).

Recall that the parameters of our model are the wj values. These are the values we will adjust to minimize J(w). In gradient descent, each iteration performs the update

With each step of gradient descent, we expect our updated parameters w j to come closer to the parameters that will achieve the lowest value of J(w).

• As we perform gradient descent, it is helpful to monitor the convergence by computing the loss, i.e., the value of the objective function J. Complete loss_and_grad() to calculate J(w), and the gradient. Test your results by running the code in the main file Notebook .ipnyb. If you implement everything correctly, you should get the loss function around 4 and gradient approximately [-3.2, -10.5].

We will use the following specifications for the gradient descent algorithm:

– We run the algorithm for 10, 000 iterations.

– We will use a fixed step size.

• So far, you have used a default learning rate (or step size) of η = 0.01. Try different η = 10-4 , 10-3 , 10-1, and make a table of the coefficients and the final value of the objective function. How do the coefficients compare?

(e) In class, we learned that the closed-form solution to linear regression is

w = (XTX)-1 XTy.

Using this formula, you will get an exact solution in one calculation: there is no “loop until convergence” like in gradient descent.

• Implement the closed-form solution closed_form().

• What is the closed-form solution? How do the coefficients and the cost compare to those obtained by GD? How quickly does the algorithm run compared to GD?

Polynomial Regression

Now let us consider the more complicated case of polynomial regression, where our hypothesis is

hw(x) = wT φ(x) = w0 + w1儿 + w2儿2 + ... + wm儿m.

(f) Recall that polynomial regression can be considered as an extension of linear regression in which we replace our input matrix X with

where φ(x) is a function such that φj(x) = xj for j = 0, . . . , m.

Update gen_poly_features() for the case when m ≥ 2.

(g) For m = {0, . . . , 10}, use the closed-form solver to determine the best-fit polynomial regres- sion model on the training data, and with this model, calculate the loss on both the training data and the test data. Generate a plot depicting how loss varies with model complexity (polynomial degree) – you should generate a single plot with both training and test error, and include this plot in your writeup. Which degree polynomial would you say best fits the data? Was there evidence of under/overfitting the data? Use your plot to justify your answer.

Regularization

Finally, we will explore the role of regularization. For this problem, we will use l2-regularization so that our regularized objective function is

again optimizing for the parameters θ .

(h) Modify loss_and_grad() to incorporate l2-regularization.

(i) Use your updated solver to find the coefficients that minimize the error for a tenth-degree polynomial (m = 10) given regularization factor λ = 0, 10-8 , 10-7 , . . . , 10-1 , 100. Now use these coefficients to calculate the loss (unregularized) on both the training data and test data as a function of λ . Generate a plot depicting how the loss error varies with λ (for your x-axis, let x = [1, 2, . . . , 10] correspond to λ = [0, 10-8 , 10-7 , . . . , 100] so that λ is on a logistic scale, with regularization increasing as x increases). Which λ value appears to work best?

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030

联系我们

EMail: 99515681@qq.com

QQ: 99515681

留学生作业帮-留学生的知心伴侣！

工作时间：08:00-21:00

微信客服：codinghelp

热门主题

课程名