代做ETC3250 Introduction to machine learning 2024 Semester One代写留学生Matlab程序

2024 Semester One (June 2024)

Examination Period

Faculty of Business and Economics

EXAM CODES:                        ETC3250

TITLE OF PAPER:                  Introduction to machine learning

EXAM DURATION:                2 hours 10 mins

Section A:

Information

Section A. Please answer ALL questions.

Question 1

Which of the following categorical response variables matches the binary matrix coding below:

Select one:

a. (A, A, B, C, C, A)'

b. (A, B, C, B, C, A)'

c. (B, A, C, A, A, C)'

d. None of these because the coding is not binary

e. (C, B, C, A, A, C)'

Question 2

Which of these plots would be considered the model plotted in the data space?

A: The line of points is an SVM boundary

B: Convex hulls marking the results of a -means clustering

C: Votes matrix from a random forest fit

Select one:

a. B

b. C

c. A and C

d. A

e. A and B

f. A and B and C

g. B and C

Question 3

The term _________ means the model overlaid on the data, with the primary purpose being to examine how well the model fits the main structures present in the data.

Select one:

a. model-in-the-data-space

b. biplot

c. principal component analysis

d. data-in-the-model-space

e. tours of linear projections

Question 4

Which of the following projection matrices match the axes for this projection:

Select one:

a.

X1 X2 var

1 0.324 0.03325 tr1

2 0.033 0.84597 tr2

3 -0.079 -0.50492 hed

4 0.689 -0.00038 ad1

5 0.176 0.07742 ad2

6 -0.618 0.14931 ad3

b.

none of them match

c.

X1 X2 var

1 0.47 0.049 tr1

2 0.18 0.783 tr2

3 0.11 -0.593 hed

4 0.68 -0.158 ad1

5 -0.13 0.080 ad2

6 -0.50 -0.046 ad3

d.

X1 X2 var

1 0.99673 0.00017 tr1

2 0.00017 0.99933 tr2

3 -0.00624 -0.03463 hed

4 0.05882 -0.00076 ad1

5 0.01496 0.00513 ad2

6 -0.05296 0.01092 ad3

e.

X1 X2 var

1 0.496 0.139 tr1

2 0.446 0.505 tr2

3 0.330 -0.606 hed

4 0.243 -0.349 ad1

5 -0.621 0.034 ad2

6 -0.024 -0.485 ad3

Question 5

When doing 5 — fold cross-validation, with these splits of the data:

fold 1: 1, 3, 4

fold 2: 2, 10, 15

fold 3: 7, 8, 9

fold 4: 5, 11, 14

fold 5: 6, 12, 13

Which subset of observations would be used to train the model when working with fold 4?

Select one:

a. 1, 3, 4

b. 1, 2, 3, 4, 6, 7, 8, 9, 10, 12, 13, 15

c. 5, 11, 14

d. 7, 8, 9

e. 2, 10, 15

Question 6

From the following summary of a PCA, what proportion of the total variance would four principal components explain? (Note: The data was standardised prior to computing the PCA. If no values match exactly, pick the closest.)

> auswt20_pca$sdev

[1] 2.723 2.053 1.175 0.974 0.902 0.836 0.700 0.533

[9] 0.466 0.421 0.351 0.321 0.273 0.220 0.081 0.063

[17] 0.015

Select one:

a. 41%

b. 82%

c. 46%

d. 0.457

e. 0.82

f. 5.7%

g. 0.057

h. 0.407

Question 7

For data having n = 92 and p = 5, how many parameters would need to be estimated to compute the variance-covariance matrix?

Select one:

a. 14

b. 4

c. 92

d. 91

e. 24

f. 25

g. 15

Question 8

The following output summarises the results from PCA on player statistics women’s AFL matches from 2023. Statistics for each player have been averaged across the season. There are statistics on 508 players. PCA was computed on the correlation matrix.

a. (1pt) How many variables in the data?

b. (1pt) How many PCs have eigenvalues higher than would be expected from purely uncorrelated data?

c. (3pts) Which variables significantly contribute to PC1? Explain your answer.

d. (1pt) Is it reasonable to assume that the variables were standardised when computing the PCA? Why?

e. (2pts) Interpret PC1, in a few sentences.

Question 9

The following output summarises the first four PCs from PCA on player statistics women’s AFL matches from 2023. Statistics for each player have been averaged across the season. There are statistics on 508 players. PCA was computed on the correlation matrix.

Make a sketch that shows where the axis forbehinds would be on a biplot of PC1 vs PC2.

(3pts: 1.5 for correct line segment, 1.5 for labelling axes, and adding scales.)

Question 10

The following output summarises the first four PCs from PCA on player statistics women’s AFL matches from 2023. Statistics for each player have been averaged across the season. There are statistics on 508 players. PCA was computed on the correlation matrix.

Explain in a few sentences what type of player Randall is.

Question 11

The following plots are produced from player statistics women’s AFL matches from 2023. Statistics for each player have been averaged across the season. There are statistics on 508 players. Both results were computed on standardised data.

Explain in a few sentences what would be learned from the UMAP representation of the AFLW statistics that might differ from that shown in the first two PCs.

Section B:

Information

Section B. Please answer ALL questions.

Question 12

From the following plot of data, what would likely be the pooled variance-covariance matrix?

VC1 | VC2 | VC3 | VC4

x1 x2 | x1 x2 | x1 x2 | x1 x2

x1 5.6 -3.0 | x1 1.03 0.98 | x1 5.4 2.9 | x1 1.14 -0.98

x2 -3.0 5.6 | x2 0.98 1.14 | x2 2.9 4.9 | x2 -0.98 1.03

Select one:

a. VC1

b. VC4

c. VC3

d. None of these

e. VC2

Question 13

From the following plot of data, and three possible boundaries from an LDA fit marked by A, B, C.

If the model is fitted with group 1 having a higher prior probability, which is likely the boundary for that model?

Select one:

a. None of these is possible

b. C

c. A

d. B

e. Either A or C is possible

Question 14

In the derivation of different forms of the equations for a logistic regression model:

What is the explanation of going from step 3 to 4?

Select one:

a. take natural log of both sides

b. subtract 1 from both sides

c. multiply numerator and denominator of LHS by y

d. ÷numerator and denominator by eβ0+β1x

e. invert both sides

Question 15

For two classes coded as 0 and 1, what would be the class prediction for the following logistic model fit?

Select one:

a. 0.0183

b. 1

c. 0

d. 0.881

e. 0.0180

Question 16

For the following random forest model fit, and votes matrix values for five observations, what would be the class prediction for observation 3?

> pebbles_rf

Call:

randomForest(formula = cl ~ ., data = pebbles)

Type of random forest: classification

Number of trees: 500

No. of variables tried at each split: 1

OOB estimate of error rate: 0.51%

Confusion matrix:

A B class.error

A 101 1 0.0098

B 0 94 0.0000

> pebbles_rf$votes[ids,]

A B

1 0.688 0.31

2 0.778 0.22

3 0.455 0.55

4 0.048 0.95

5. 0.133 0.87

Select one:

a. A and B are equally plausible

b. 0.778

c. A

d. B

e. 0.55

Question 17

The following values are the predictive probabilities of the test set for a random forest fitted to a data set with two classes. There are 196 observations, cl indicates true class. The values are sorted, and you can assume that rows 1-97 are all identical, and rows 108-196 are identical. If class B is the positive class, compute sensitivity for a cutoff of 0.7 (anything above 0.7 is predicted to be B).

id cl A B

1 A 1.00 0.00

...

97 A 1.00 0.00

98 A 0.80 0.20

99 A 0.78 0.22

100 A 0.75 0.25

101 A 0.69 0.31

102 A 0.45 0.55

103 B 0.22 0.78

104 B 0.13 0.87

105 B 0.08 0.92

106 B 0.06 0.94

107 B 0.05 0.95

108 B 0.00 1.00

...

196 B 0.00 1.00

Select one:

a. 0.98

b. 1

c. 0

d. 0.01

e. 0.02

Question 18

The following is a diagram for a neural network model. If the number of observations in the data set were n = 46, how many observations per parameter to be estimated in the model? (That is, divided by the number of parameters.)

Select one:

a. 4.2

b. 23

c. 4

d. 2

e. 8.4

f. None of these

Question 19

From the following summaries:

answer the following questions:

a. (1pt) What is the data dimension, ?

b. (1pt) What is the pooled variance-covariance, ?

c. (3pts) Compute and report the LDA rule to classify group A from group B, assuming equal prior probabilities.

Question 20

This summarises a tree fit to the last 25 years of Australian tourism data modeling the difference in patterns between Cairns and Melbourne. Only business travel is examined, and the four variables used are Q1, Q2, Q3, Q4 which are quarters in the year. Each series was standardised on itself, so values represent proportion of travel for business relative to other types of travel to the city in each quarter of each year. We are curious to determine whether business travel tends to be in different seasons in the two locations.

n= 50

node), split, n, loss, yval, (yprob)

* denotes terminal node

1) root 50 25 Cairns (0.50 0.50)

2) Q1< 0.19 25 2 Cairns (0.92 0.08)

4) Q2< 0.25 20 0 Cairns (1.00 0.00) *

5) Q2>=0.25 5 2 Cairns (0.60 0.40)

10) Q2>=0.27 2 0 Cairns (1.00 0.00) *

11) Q2< 0.27 3 1 Melbourne (0.33 0.67) *

3) Q1>=0.19 25 2 Melbourne (0.08 0.92)

6) Q3< 0.23 3 1 Cairns (0.67 0.33) *

7) Q3>=0.23 22 0 Melbourne (0.00 1.00) *

a. (1pt) How many terminal nodes in the tree?

b. (1pt) How many of the four variables are used in the model?

c. (1pt) Which variable would be considered to be the most important?

d. (1pt) Which terminal nodes are pure nodes (having only one class)?

e. (1pt) How many observations are there at node 7?

f. (2pts) Based on this model, how would you describe the differences in business travel between Melbourne and Cairns?

Question 21

This summarises a linear support vector machine fit to the last 25 years of Australian tourism data modeling the difference in patterns between Cairns and Melbourne. Only business travel is examined, and the four variables used are Q1, Q2, Q3, Q4 which are quarters in the year. Each series was standardised on itself, so values represent proportion of travel for business relative to other types of travel to the city in each quarter of each year. We are curious to determine whether business travel tends to be in different seasons in the two locations.

> melb_cairns_svm_b$fit@b

[1] 4.2

> melb_cairns_svm_b$fit@SVindex

[1] 2 3 5 6 16 18 19 21 22 26 27 30 34 40 41 48 49 50

> melb_cairns_svm_b$fit@coef

[[1]]

[1] -10.0 -10.0 -10.0 -10.0 -9.0 -10.0 -10.0 -10.0 -9.4 10.0

[11] 10.0 10.0 10.0 8.5 10.0 10.0 10.0 10.0

The top few rows of the data are:

> melb_cairns[,c(1,3,5,7,9)] |> slice_head(n=5)

# A tibble: 5 × 5

Region Q1 Q2 Q3 Q4

1 Cairns 0.259 0.220 0.629 0.300

2 Cairns 0.205 0.345 0.586 0.348

3 Cairns 0.272 0.475 0.500 0.275

4 Cairns 0.173 0.533 0.523 0.380

5 Cairns 0.360 0.498 0.565 0.374

The coefficients for the separating hyperplane plane are calculated to be:

> melb_cairns_betas_b

Q1 Q2 Q3 Q4

5.8 3.1 5.0 3.2

a. (1pt) How many support vectors are used to compute the coefficients for the separating hyperplane?

b. (1pt) Write down the equation of the separating hyperplane?

c. (1pt) Which variable(s) would be considered to be the most important to distinguish the difference between business trips to Cairns and Melbourne?

d. (2pts) Explain how you would use the quantities from the fitted model object to compute the coefficients.

e. (2pts) Was Melbourne or Cairns coded as -1? Why do you think so?





热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图