代写MAST90138 Assignment 3代做留学生R程序-留学生作业帮

代写MAST90138 Assignment 3代做留学生R程序

MAST90138 Assignment 3

Instructions:

• The assignment contains 3 problems worth a total of 100 points which will count towards
15% of the final mark for the course. If you LATEXand knitr your assignment in a nice way, you will potentially get up to a maximum of 0.75% towards the final mark for the course as extra credits.

• Use tables, graphs and concise text explanations to support your answers. Unclear answers
may not be marked at your own cost. All tables and graphs must be clearly commented
and identified.
• No late submission is allowed.
Data: In the assignment you will analyze some rainfall data. The dataset is available in .txt
format on the LMS website. To load the data into R you can use the function read.table()
or any command of your choice. You may need to manipulate the data format (data frames
or matrices) depending on the task. The data are separated in a training set and a test set.
The training set contain p = 365 explanatory variables X1, . . . , Xp and one class membership
(G = 0 or 1) for ntrain = 150 individuals. The test set contains p = 365 explanatory variabless
X1, . . . , Xp and one class membership (G = 0 or 1) for ntest = 41 individuals.
In these data, for each individual, X1, . . . , Xp correspond to the amount of rainfall at each
of the p = 365 days in a year. Each individual in this case is a place in Australia coming either
from the North (G = 0) or from the South (G = 1) of the country. Thus, the two classes (North
and South) are coded by 0 and 1.
You will use the training data to fit your models or train classifiers. Once you have fitted
your model or trained your classifiers with the training data, you will need to check how well
the fitted models/trained classifiers work on the test data.
The test and training data are all placed in different text files: XGtrainRain.txt, which
contains the training X data (values of the p explanatory X-variables) for ntrain = 150 indi-
viduals as well as their class (0 or 1) label, and XGtestRain.txt, which contains the test X
data (values of the p explanatory X-variables) for ntest = 41 as well as their class (0 or 1)
label. The test class membership is provided to you ONLY TO COMPUTE THE ERROR OF
CLASSIFICATION of your classifier.
Please include all the necessary R code to answer the questions, but not super-
fluous R code that are not relevant. Marks may be taken off for R code that is
poorly presented.
You may take classification error/test error to be the proportion/percentage out of the
41 test samples that are misclassified.
Problem 1 [60 marks]:
In this problem you will train quadratic discriminant (QDA) and the logistic regression
classifiers to predict the class labels (0 or 1) in the test set.
(a) Use standard functions in R to train the QDA classifier and the logistic classifier, with all
the p predictors in the training set. What happened? And why did it happen? Do you
recommend using these two classifiers on the test set? (Hint: For the logistic classifier,
use the summary function to take a look at the trained model object) [10]
(b) Use prcomp and the plsr (package pls) functions to obtain, respectively, the PCA and
PLS (partial least square) components of the explanatory variables, in the training set.
Here, when considering the covariance maximization problem of PLS, we maximise the
covariance between X = (X1, . . . , Xp)
T and Y = 1{G = 1}, the indicator variable that an
individual belongs to group 1. For each case, you will need to use the “projection matrix”
(i.e., Γ for PCA and Φ for PLS discussed in class) reported by the function to re-compute
the components “manually” to check that you understand how the components are ob-
tained. [10]
(c) Train a QDA classifier with the PLS components, and another one with the PCA compo-
nents. In each case, pick the number of components to use based on leave-one-out cross
validation (LOOCV); consider up to using 50 components. Plot the leave-one-out CV
error against the number of components considered. Report the final chosen number of
components. (Refer to the lab in Week 7 to get some ideas)
Do the same for the logistic classifier.
(If you want to pick your number of components based on methods other than LOOCV,
please explain your choice in a clear and concise manner)
[20]
(d) For each of the QDA and logistic classifiers, which version (PCA or PLS) do you prefer?
Why? (Answer this question without any knowledge of the test-set results in the next
problem) [5]
(e) Apply your trained classifiers in (c) to the test set, and report the resulting classification
error (test error). Be careful about how you should center the data in your test set to
produce your prediction. The lab in Week 7 may give you some ideas again. [15]
Problem 2 [30 marks]:
In this problem you will train random forest (RF) classifiers to predict the class labels (0 or
1) in the test set.
(a) Using the randomForest package in R, construct a random forest classifier using all p
predictor variables in the training set. When training the classifier, use the default value
of m (the number of random candidate variables for each split), but justify your choice
for the number of trees B using the out-of-bag (OOB) classification error. Plot a graph
showing the OOB error against the number of trees used. [15]
(b) Show two graphs that illustrate the importance of the Xj variables, for both decrease in
OOB prediction accuracy and decrease in node impurities measured by Gini index. Is
there an explanation of why those particular Xj ’s are the most important for classification
in this rainfall example? [5]
(c) Apply the resulting trained classifier to the test data Xtest, and compute the resulting
classification error. Try training your RF multiple times. Do you always get the same
classification error? If yes, why? If not, why and what can you do to make the forest
more stable and why? [10]
Problem 3 [10 marks]:
Compare the percentage of misclassification for each of the five classifiers (Logistic + PCA,
Logistic + PLS, QDA + PCA, QDA + PLS, RF) considered in the previous problems. Identify
the classifiers that worked the best, those which worked the worst, and comment those results.
Provide an explanation of the poorer/better performance of some of the classifiers.

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030

联系我们

EMail: 99515681@qq.com

QQ: 99515681

留学生作业帮-留学生的知心伴侣！

工作时间：08:00-21:00

微信客服：codinghelp

热门主题

课程名