代做MATH2010 Statistical Modelling I SEMESTER 2 EXAMINATION 2020/21代做迭代-留学生作业帮

代做MATH2010 Statistical Modelling I SEMESTER 2 EXAMINATION 2020/21代做迭代

MATH2010 Statistical Modelling I

SEMESTER 2 EXAMINATION 2020/21

1. [Total 32 marks] Consider the no-intercept model where

Yi ~ N (βxi , σ 2 ) ,

independently, for i = 1, . . . , n.

(a) Find the least squares estimator,β(^)LS , of β and show that it can be written as a linear combination of Y1 , . . . , Yn. [6 marks]

(b) Show thatβ(^)LS is unbiased and ﬁnd its sampling distribution. [8 marks]

Consider an alternative estimator given by

where Y- = n/1 Σni=1 Yi and = n/1 Σni=1 xi.

(d) Prove that var(β(^)A ) ≥ var(β(^)LS ). [6 marks]

Hint: The Cauchy-Schwarz inequality states that, for numbers a1 , . . . , an and

b1 , . . . , bn ,

Deﬁne the jth random ﬁtted value to be Y(^)j = β(^)LS xj, for j = 1, . . . , n.

(e) Show that

and

for k = 1, . . . , n. [6 marks]

Hint: For random variables U and V and constants a and b, cov(aU, bV) = abcov(U, V).

2. [Total 28 marks]

(a) Consider a simple linear regression model for the relationship between a response Y and a single explanatory variable x.

(i) State three assumptions that underpin the simple linear regression model. [3 marks]

(ii) After ﬁtting a simple linear regression model to data, describe two diagnostic plots that can be produced and state how the plots should appear if the

assumptions underpinning the model are correct. [4 marks]

(iii) Consider predicting a future observation Y0 with explanatory variable x0 . For ﬁxed observed responses and explanatory variables, what value of x0 will

minimise the width of the 100(1 - Q)% prediction interval for Y0 ? Justify your answer. [3 marks]

(b) The R output below shows the result of ﬁtting a simple linear regression model to data from the 2016 National Football League (NFL) season. For each of the

n = 32 teams, the response is the number of games won (out of 16) in the 2016 season and the explanatory variable is the average number of points scored per game. Additionally, if xi is the average number of points per game for the ith

team, then Σ1 xi = 728.6 and Σni=1 x2i = 17118.44.

## ## Call: ## lm(formula = wins ~ points) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.3954 -1.9349 0.0711 2.0985 4.5602 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -2.1849 2.6811 -0.815 0.421524 ## points 0.4446 0.1159 3.835 0.000599 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.666 on 30 degrees of freedom ## Multiple R-squared: 0.329,Adjusted R-squared: 0.3066 ## F-statistic: 14.71 on 1 and 30 DF, p-value: 0.0005992

(i) Under the ﬁtted model, what is the estimated increase in expected number of wins per season if average points per game increased by δ? Estimate how many more points a team would need to score on average per game to

increase their expected number of wins per season by one. [4 marks]

(ii) Suppose a team scores an average of 18 points per game. Calculate a 95% prediction interval for the number of games they will win in a season. [7 marks] Hint: You may ﬁnd some of the following R code and output useful

qt(0.975, df = 30) ## [1] 2.042272 qt(0.95, df = 30) ## [1] 1.697261

The NFL is actually split into separate conferences; the American Football

Conference (AFC) and the National Football Conference (NFC), with each team belonging to exactly one of these conferences. A dummy variable called z is

created where for the ith team

for i = 1, . . . , n.

The R output below shows the result of ﬁtting a further linear regression model to

the data from the 2016 National Football League (NFL) season, this time including the dummy variable.

## ## Call: ## lm(formula = wins ~ points + z + points * z) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5.1241 -1.9322 0.2006 1.7199 4.4627 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -6.5646 4.5121 -1.455 0.15682 ## points 0.6519 0.1989 3.278 0.00279 ** ## z 6.4157 5.6416 1.137 0.26509 ## points:z -0.3073 0.2454 -1.252 0.22087 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.671 on 28 degrees of freedom ## Multiple R-squared: 0.3717,Adjusted R-squared: 0.3044 ## F-statistic: 5.521 on 3 and 28 DF, p-value: 0.00416

(iii) By considering the above ﬁtted model, write down an expression giving the

estimated expected number of wins per season for AFC teams who score x points per game. Write down the corresponding expression for teams from the NFC. [7 marks]

3. [Total 40 marks]

(a) Suppose n responses Y = (Y1 , . . . , Yn ) satisfy the following general linear regression model

Y = X1β 1 + X2β 2 + ε ,

where ε ~ N (0, σ2In ).

However, the following general linear regression model is ﬁtted Y = X1β 1 + ε ,

where ε ~ N (0, σ2In ).

Show that the bias of the least squares estimator of β1 under the ﬁtted model is equal to

bias(β(^)1 ) = E(β(^)1 ) - β1 = (X1(T)X1 )-1X1(T)X2β 2 . [6 marks]

(b) An experiment has been performed to investigate the relationship between the

heat evolved in the setting of cement and its chemical composition. The

response is the heat evolved and there are m = 3 explanatory variables (x1 , x2 and x3) giving the percentage weight in clinkers of three different chemicals. A total of n = 13 samples of cement were used in the experiment.

A series of eight linear regression models, labelled A, B, . . . , H, are ﬁtted. The table below shows the residual sum of squares (RSS, to 3 decimal places) of each of these models where the + or — in the columns headed x1 , x2 and x3 indicates whether the model includes (+) or excludes (—) the corresponding explanatory variable.

(i) For each model (A, B,. . . ,H), write down the value of k, the number of explanatory variables. [1 mark]

(ii) Consider the use of the Akaike information criterion (AIC) and the Bayesian

information criterion (BIC) for model selection. Explain how both AIC and BIC balance goodness-of-ﬁt and model complexity, remarking on the relative strength of penalties for model complexity of AIC and BIC. [4 marks]

(iii) For each model (A, B,. . . ,H), calculate the value of n log (RSS/n), where

log(·) refers to the natural logarithm. Hence calculate the value of AIC and BIC for each model. [4 marks]

(iv) For each model (A, B,. . . ,H), calculate the value of the corrected Akaike information criterion, given by

[4 marks]

(v) Determine the ﬁnal chosen model under each of AIC, BIC and CAIC. [3 marks]

Below are a series of analysis of variance (ANOVA) tables presenting the result of F-tests. All quantities are given to 3 decimal places.

• Comparison of Model E & Model H

Source	Df	Sum of Squares	Mean Squares	F Value	P Value
Difference	1	9.794	9.794	1.832	0.209
Model H	9	48.111	5.346

Model E

57.904

• Comparison of Model F & Model H

Source	Df	Sum of Squares	Mean Squares	F Value	P Value
Difference	1	1178.961	1178.961	220.547	< 0.001
Model H	9	48.111	5.346

Model F

1227.072

• Comparison of Model G & Model H

Source	Df	Sum of Squares	Mean Squares	F Value	P Value
Difference	1	367.332	367.332	68.716	< 0.001
Model H	9	48.111	5.346

Model G

415.443

• Comparison of Model B & Model E

Source	Df	Sum of Squares	Mean Squares	F Value	P Value
Difference	1	1207.782	1207.782	208.582	< 0.001
Model E	10	57.904	5.790

Model B

1265.687

• Comparison of Model C & Model E

Source	Df	Sum of Squares	Mean Squares	F Value	P Value
Difference	1	848.432	848.432	146.523	< 0.001
Model E	10	57.904	5.790

Model C

906.336

(vi) From the above ANOVA tables, outline the steps taken by a backwards

selection modelselection procedure using F-tests. For each step, name the current model and the models that are compared to the current model. Clearly state the ﬁnal chosen model. Use the 5% level of signiﬁcance. [6 marks]

(vii) Perform a forwards selection modelselection procedure using F-tests. Use the 5% level of signiﬁcance. [12 marks]

Hint: You will ﬁnd the following R code and output useful

qf(0.95, df1 = 1, df2 = 11) ## [1] 4.844336 qf(0.95, df1 = 1, df2 = 10) ## [1] 4.964603

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030

联系我们

EMail: 99515681@qq.com

QQ: 99515681

留学生作业帮-留学生的知心伴侣！

工作时间：08:00-21:00

微信客服：codinghelp

热门主题

课程名