代写MATH2010 Statistical Modelling I SEMESTER 1 EXAMINATION 2021/22代做回归

MATH2010 Statistical Modelling I

SEMESTER 1 EXAMINATION 2021/22

1.    [25 marks] Consider the following simple linear regression model without intercept:

Yi  = βxi + ∈i ,                                           (1)

i = 1, . . . , n, where ∈i ~ N(0, σ2 ).

(a)  [5 marks] Show that the least-squares estimator for β has the form.

Show thatβ(^) is unbiased, and compute its variance.

(b)  [2 marks] Given explanatory variables x1 , . . . , xn and associated response variables y1 , . . . , yn, it is common to center the response and explanatory    variables, i.e. to compute

Consider the alternative model:

YiI = βIxi(I) + i.                                       (2)

Derive the least-squares estimator for βI in terms of xi and Yi.

(c) [2 marks] Explain the difference between an estimator and an estimate. Write down the estimate corresponding to β' from model (2).

(d)  [4 marks] Recall that under the simple linear regression model

Yi = β0 + β1xi + ∈i                                                       (3)

the estimate for β1 is given by

Show that the estimate obtained in (c) is equal to b(^)1 .

(e) The dataset in Table 1 gives the sales (yi; in £) and the advertising budget spent on newspaper advertisements (xi; in £) for a variety of products.

advertising.data = data.frame(

Newspaper=c(69.2, 45.1, 69.3, 58.5, 58.4),

Sales = c(22.1, 10.4, 9.3, 18.5, 12.9)

)

(i) [8 marks] Compute a 95% confidence interval for β in model (1), and β, in model (2). Test the null hypothesis that β = 0, and the null hypothesis that β, = 0, at the 95% level. You may nd the following outputs from R helpful:

qt(0.95, 4)

## [1] 2.13

qt(0.95, 3)

## [1] 2.35

qt(0.975, 4)

## [1] 2.78

qt(0.975, 3)

## [1] 3.18

(ii)  [4 marks] Consider the location x0  = 0, and let x = x0  -  . Let

Y(^)0 = βx0 + 0 and Y(^)0,  = β,x  + 0 . Write down the conditional means

E(Y(^)0 ) and E(Y(^)0, ). Considering the computed conditional means, do you think

model (1) or model (2) is more appropriate for this data? Considering the

confidence intervals computed above, do you believe there is a significant relationship between budget spent on newspaper advertisements and sales?

2.    [25 marks] Consider the usual multiple linear regression model Y = + ε ,

with Y an n × 1 response vector, X an n × (k + 1) design matrix, β a k + 1 vector of unknown parameters and ε ~ N (0, σ2In ). Assume least squares will be used to  estimate β .

(a)  [2 marks] Describe two diagnostic plots that are commonly used to check the assumptions that underpin a linear model, and discuss which assumptions they are intended to verify.

(b)  [2 marks] Consider the following two plots. Explain what evidence of deviation from the model assumptions are shown in each plot.

(c)  [8 marks] The leverage li of the ith  observation is defined to be the derivative of the ith fitted value with-respect-to the ith  observation,i.e.:

Show that li is given by

(d)  [2 marks] A data point (xi , yi ) is said to be influential if the regression line

changes significantly when that point is excluded from the analysis. There can be more than one influential point for a regression model. Consider a scatter plot of   leverages on the x-axis against standardised residuals on they-axis such as the  figure below.

Which of the labelled points on the figure above do you think could be influential? Explain why.

(e) Consider the “leave-one-out” procedure, in which observation i is left out of the

regression analysis. Letb(^)(-i) denote the parameter vector obtained when

observation i is left out, and recall that we have the following formula for the change in the parameter vector:

where ri is the ith  residual, ri  = yi - ^(y)i. Consider the simple linear regression problem Yi  = β0 + β1xi + i. In this case, δ i is a vector of length 2,

δ i = [δi;0, δi;1]. The dataset in Table 2 gives 4 observations of the response yi , with explanatory variable xi, as well as the values of li and δi for all but the 4th   observation.

The data can be read into R with the command:

data = data.frame(

x = c(-1, -0.9, 1, 5),

y = c(0.7569, 0.6948, 1.2881, 2.2798)

)

(i)  [8 marks] Compute l4 , δ4;0 and δ4;1 . If you use R, or other software, to answer this question, please clearly state the commands you use.

(ii)  [3 marks] Do you think the fourth point is more influential than the first three  points? Explain why, or why not, particularly commenting on the values of l4 , δ4;0 and δ4;1 in your answer.

3.    [25 marks] In a particular data set, n = 20 observations were made on a response variable (y) and three explanatory variables (x1, x2 and x3).

(a)  [6 marks] Partial outputs from the summary command in R applied to lm objects regressing each of the explanatory variables on the response separately are given below. From this output, calculate AIC for each of the three models.

##

## Call:

## lm(formula = y ~ x1)

##

##

## Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 13.3004 0.5331 24.95 2.1e-15

## x1 0.0989 0.5857 0.17 0.87

##

## Residual standard error: 2.33 on 18 degrees of freedom

## F-statistic: 0.0285 on 1 and 18 DF, p-value: 0.868


##

## Call:

## lm(formula = y ~ x2)

##

##

## Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 13.334 0.262 50.96 < 2e-16

## x2 2.255 0.308 7.32 8.5e-07

##

## Residual standard error: 1.17 on 18 degrees of freedom

## F-statistic: 53.6 on 1 and 18 DF, p-value: 8.51e-07


##

## Call:

## lm(formula = y ~ x3)

##

##

## Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 13.302 0.250 53.18 < 2e-16

## x3 2.339 0.301 7.77 3.7e-07

##

## Residual standard error: 1.12 on 18 degrees of freedom

## F-statistic: 60.3 on 1 and 18 DF, p-value: 3.73e-07

(b)  [2 marks] Given that AIC for the null model (containing no explanatory variables) is 31.783, which is the first variable that you would include when building a model using forward selection and AIC?

(c)  [2 marks] A multiple linear regression model containing all three variables is now

fitted using lm, and partial output from the summary command is given below.

##

## Call:

## lm(formula = y ~ x1 + x2 + x3)

##

##

## Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 13.186 0.244 54.09 <2e-16

## x1 0.515 0.281 1.83 0.086

## x2 -1.192 3.171 -0.38 0.712

## x3 3.660 3.227 1.13 0.273

##

## Residual standard error: 1.06 on 16 degrees of freedom

Using t-tests, which variables can be individually dropped from the model without adversely affecting the quality of the fit?

(d)  [5 marks] Give a possible explanation for the contradiction between your results from parts (b) and (c).

(e)  [4 marks] Given that the sample variance of the response is sy(2) = 5.157, what is

the adjusted R2 value for the above multiple regression model? Is the model a good fit?

(f)  [6 marks] Test the significance of the multiple regression model for these data; that is, compare it to the null model.

You may find the following quantities from R useful:

qf(0.95, 4, 16)

## [1] 3.01

qf(0.95, 3, 16)

## [1] 3.24

qf(0.95, 3, 20)

## [1] 3.1

4.    [25 marks] Every year Forbes produces a data set of the top 2000 companies

world-wide, ranked by quantities such as Sales and Assets, across different market sectors, countries etc.

A linear model was fitted to the 2017 data, regressing log(Sales) against

log(Assets) (aquantitative variable) and Sector (a qualitative factor with 11 levels). The following analysis of variance table was obtained.

(a)  [6 marks] Write down each of the nested model comparisons summarised in this table.

(b)  [8 marks] Complete the ANOVA table. Give reasons for your choices for each entry in the “DF” column.

(c)  [1 mark] What proportion of variation is explained by the model?

(d)  [10 marks] Simplified output from the summary command from the tted lm

object in R for this model is given on the next page. The units of each of Sales  and Assets are billions of dollars; in the data set, Apple had sales of 217.5 and assets of 331.1, both in billions of dollars, larger than any other “Information

Technology” company.

What is the equation for the estimated mean response (log(Sales)) from a company in the “Information Technology” sector in terms of log(Assets)? What is it for the “Consumer Discretionary” Sector? Sketch a plot with these two equations. At what value of Assets is an information technology company predicted to have greater mean sales than a company in the consumer discretionary sector?

The company with largest sales in the data set is Wal-Mart (a consumer discretionary company), with sales of 485.3. According to the regression model, how large would Apple’s assets need to be to surpass Wal-Mart’s sales? What   caveats do you have about this prediction?

Learning objectives:

LO1 Use the theory of linear models and matrix algebra to investigate standard and non- standard problems.

LO2 Interpret the output from an analysis including the meaning of interactions and terms based on qualitative factors.

LO3 Understand how to make a critical appraisal of a fitted model.

LO4 Carry out t-tests and calculate confidence intervals by hand and by computer. LO5 Using a variety of procedures for variable selection.

LO6 Fit multiple regression models using the adopted software package.

LO7 Carry out simple linear regression by computer.

LO6 and LO7 are assessed via coursework.









热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图