代写EN.553.413-613, Spring 2024 Applied Stats and Data Analysis Exam 2代做留学生SQL语言程序

Applied Stats and Data Analysis

EN.553.413-613, Spring 2024

Apr 3, 2024

Exam 2

Question 1 (22 points). Consider the linear model

Y = Xβ + ε,

where Y is a n-by-1 vector of response variables, X is an n-by-p design matrix, β is a p-by-1 vector of coefficients, ε is a multivariate normal Nn(0, σ2 In). Denote by b the least-squares estimate of β.

(a) TRUE or FALSE. Y ∼ Nn(Xb, σ2 I)

(b) TRUE or FALSE. Cook’s distance measures the influence of one observation on all fitted values.

(c) TRUE or FALSE. The least-squares estimate of β can be expressed as (XTX) −1XTY.

(d) TRUE or FALSE. Deleted residuals are the same as Externally Studentized Residuals.

(e) TRUE or FALSE. In the Multiple Linear Regression predictors could be both continuous and categorical.

(f) TRUE or FALSE. If X has full rank, then the matrix XTX has rank n.

(g) TRUE or FALSE. SST o = YT (I − n/1 11T )Y, where ✶ is an n-by-1 vector of 1’s.

(h) TRUE or FALSE. Internally studentized residuals are better than externally studentized residuals in determining outliers with respect to Y .

(i) TRUE or FALSE. Leverage can detect points that are outlying with respect to both X and Y .

(j) TRUE or FALSE. Outliers with large ESR (externally studentized residuals) are always influential.

(k) TRUE or FALSE. As we add more predictors, the coefficient of multiple determination R2 may decrease.

Question 2 (16 points). In this problem we deal with the same multiple linear regression model as in Question 1.

(a) Write the least-squares objective function Q(β) that we need to minimize to fit a multiple linear regression.

(b) Write down the formula for the least-squares estimate for the regression vector b in terms of matrices X, Y. You don’t need to derive it.

(c) Using the formula above, find E(b) and V ar(b). Show your work.

(d) This part could be solved independently from other parts. Compute the covariance matrix Cov(e, Y), where e is a vector of residuals, Y is a vector of fitted values. Simplify as much as you can. What are the dimensions of the matrix Cov(e, Y)?

Question 3 (16 points). Below is the sketch of a column space Col(X) of an n-by-2 design matrix X. The vector Y is an n-by-1 vector of values Yi , 1 is an n-by-1 vector of 1’s, X1 is an n-by-1 vector of predictor values Xi1.

The dotted line (marked by a) depicts orthogonal projection on Col(X), the dashed line (marked by d) depicts orthogonal projection on the vector of 1’s.

Let βb be the least squares estimate of β

(a) Express vectors a, b, c, d in terms of Y, X, βb, Y¯ . Here, Y¯ is an n-by-1 vector of Y¯ ’s.

(b) Express vectors a, b, c, d in terms of Y, H, I, J, where H is the hat matrix, I is an n-by-n identity matrix, J is an n-by-n matrix of 1’s.

(c) If we know the coefficient of multiple determination R2 is very close to 1, what can you say about the vectors in this plot? Be as specific as you can.

(d) If we know that there is a multicollinearity, what can you say about the vectors in this plot? Be as specific as you can.

Question 4 (20 points). Consider the following regression model:

Yi = β0 + β1X1i + β2X2i + εi

where the εi are iid N(0, σ2 ). First few observations are listed below:

We use lm in R to fit a linear regression model, obtaining the output below:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 2.9427 XXXXXX YYYYY 0.0044

x1 0.5319 0.2075 2.563 0.0374

x2 0.3204 0.1906 1.681 0.1366

---

Residual standard error: 0.9001 on 7 degrees of freedom

Multiple R-squared: 0.5584,Adjusted R-squared: 0.4322

F-statistic: 4.425 on 2 and 7 DF, p-value: 0.05724

(a) State the null and the alternative hypothesis for the model utility test for this model. What is your conclusion at the significance level α = 10%? Briefly explain.

(b) Is β1 significant at the significance level α = 0.10? Is β2 significant at the significance level α = 0.10? Briefly explain.

(c) Rewrite the model utility test from part (a) in the form. of the General Linear Test:

H0 : Cβ = γ,         Ha : Cβ ≠ γ.

Identify C and γ and state their dimensions.

(d) Find the rejection region for the test in part (c) with significance level α = 0.05. Leave your answer in the form. v TA−1v > F(x, y, z). Provide numerical values for the vector v, matrix A (you don’t need to invert it), and parameters x, y, z of the quantile F(x, y, z).

(e) Standard error for the intercept β0 is hidden by XXXXX. Recover it using the data provided above.

(f) The plot below sketches the rejection regions for the tests in parts (a), (b). The inside of the purple ellipse correspond to the 90% confidence region of an F-test. Shade the areas where (0, 0) might lie in this plot.

Question 5 (20 points). Consider the grocery retailer example, where X1 denotes the num-ber of cases shipped, X2 is the indirect cost of the total labor hours as a percentage (values are between 0 and 100), and X3 is 1 if the week has a holiday, and 0 otherwise. The observations Y are the total labor hours, and we choose to fit the multiple linear regression:

Yi = β0 + β1Xi1 + β2Xi2 + β3Xi3 + β4Xi1Xi3 + β5Xi2Xi3 + εi              (1)

where the εi are i.i.d. normal with mean 0 and variance σ 2 . The lm output is shown:

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 4.271e+03 2.319e+02 18.415 <2e-16 ***

x1 7.928e-04 4.180e-04 1.897 0.0642 .

x2 -2.990e+01 2.669e+01 XXXXXX 0.2684

x3 1.642e+02 4.486e+02 0.366 0.7161

x1:x3 -2.470e-04 8.813e-04 -0.280 0.7805

x2:x3 7.078e+01 5.487e+01 1.290 0.2036

---

Residual standard error: 143.8 on 46 degrees of freedom

Multiple R-squared: 0.6992,Adjusted R-squared: 0.6665

F-statistic: 21.39 on 5 and 46 DF, p-value: 5.393e-11

The first 5 rows of the dataset are the following

Y           X1       X2        X3

4264     300     7.17        0

4945     265     8.61        1

4496     328     6.20        0

4317     317     4.61        0

4292     366     7.02        0

(a) How many observations are in the dataset?

(b) Write only the first three rows of the design matrix X for this model. What are the dimensions of the full design matrix X for this problem?

(c) Do you notice any issues with this model? If you do, how would you fix the issue?

(d) Write the estimated mean of the total labour hours during a week that has a holiday. Express it as a function of all, or some, X1, X2, X3.

(e) We would like to compare the models

g1 : Yi = β0 + β1Xi1 + β2Xi2 + β3Xi3 + β4Xi1Xi3 + β5Xi2Xi3 + εi

g2 : Yi = β0 + β1Xi1 + β2Xi2 + β3Xi3 + β4Xi1Xi3 + εi

using the General Linear Test at the level α = 0.05. What is the value F ∗ of the F-statistic of this test? You should be able to find it using the information given. Specify the numerical values c, d, e for the rejection region F ∗ > F(c, d, e) for this test.

(f) Based on this output, find the interval where the total labour hours for a week without holidays with the 5% indirect cost of the labour hours, and 100 cases shipped would be with probability 90%. Leave your answer as A ± B · t(c, d). Specify numerical values for A, B, c, d. Identify as many values as you can. Leave values that you can’t identify as variables.

Question 6 (16 points). The table below contains residuals ei , internally studentized resid-uals ri , externally studentized residuals ti and the leverage hii for the model with p = 2 predictors and n = 10 observations.

Table 1: Values of hii, ei , ri , and ti .

(a) Which points seem to be outliers with respect to Y ? Which residual (ei , ri , ti) would you use to identify them? Briefly justify.

(b) Which points seem to be outliers with respect to X? Briefly justify.

(c) Which points appear to be most influential? Briefly justify.

(d) What should be the decision rule to identify observations outlying with respect to Y at the significance level α = 0.10? Write it in the form.

|A| ≥ t(B, C)

Identify A, B, C. Be as specific as you can.







热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图