DEGREES OF MSc, MSci, MEng, BEng, BSc,MA and MA (Social Sciences)
COMPSCI5100
Machine Learning & Artificial Intelligence for Data Scientists
1. Considering linear regression on the Olympic data in figure 1.
Figure 1: Olympic data
(a) We want to predict Olympic years from 100m winning times. What should be the target value and attribute? When solving this regression task with a polynomial regression model, how would you rescale the attributes? Why? [6 marks]
(b) Based on what you have learned from fitting linear regression models (with
polynomial or RBF) to the relationship between years and winning times, predict which year may produce winning time 9s and 13s , explain why. [4 marks]
(c) The radial basis function (RBF):
is a popular basis function . The parameter μd,k is often be a data point xi,d, i = 1, … , N. Outline the strength and risk of this setup for μd,k , and how would you mitigate the risk. [5 marks]
(d) In addition to the polynomial function and RBF, linear regression can be generalized using other basis functions . One of most widely used example is the Fourier analysis, let’s consider the following linear regression model:
What is the basis function of choice here? How would you deal with the unknow parameters Aj and θj? (Hint: you might find the following trigonometry identity useful, cos(a + b) = cos(a)cos(b) + sin(a)sin(b)) . [5 marks]
2. Classification question
(a) The likelihood of logistic regression
where Use an example of a few data points to explain how the
likelihood function tells how well the parameter W fits the data . [4 marks]
(b) The following matrix contains estimated parameters values from three types of
logistic regression models. The model type is indicated by the columns . The parameter of each feature is placed in the corresponding row. Give your best estimate of what each model is and explain why.
[6 marks]
(c) Compare the effect on prediction of the three logistic models in (b) . [4 marks]
(d) Let’s consider a binary classifier trained on a falsely labeled dataset. The issue is all
(i) What would be the AUC (computed with the correct labels) when the classifier is perfectly trained on the false data? And why? [2 marks]
(ii) Provide the range of possible values for the missing output (labeled ‘?’) that would be produced by the classifier in (i) . Explain why. [2 marks]
(iii) What would be the AUC (computed with the correct labels) of a random classifier trained on the falsely labeled data? Why? [2 marks]
3. Clustering question (Figures in this question were taken from the sklearn clustering tutorial:
https://scikit-learn.org/stable/modules/clustering.html)
(a) Describe clustering results of K-means and Gaussian Mixture in figure 2. Hint: answer should address parameters estimation, initial conditions and selecting the number of clusters .
(b) Suppose we want to avoid any data point from the inner ring being assigned to the same cluster with any point data point from the outer ring. Outline two approaches to achieve this goal with the Gaussian mixture model? Hint: You don’t have to use just 2 clusters . [4 marks]
(c) Describe clustering results of K-means and Gaussian Mixture in figure 3. Hint: answer should address parameters estimation, initial conditions and selecting the number of clusters .
Figure 3: Clustering results of (A) K-means and (B) Gaussian mixture model.
[6 marks]
(d) Suppose Figure 3 (B) represents the results we want. Outline one approach to achieve this goal with K-means . Hint: Sufficient details of the approach are required to get full marks. [4 marks]