AMS2320 Business Regression Analysis
Semester 2, 2024-2025
ASSIGNMENT 2
There are two questions. Please answer all questions and show your steps clearly. Correct your numerical answers to 4 decimal places. You may use SAS and/or R to do the calculations. Please include the program codes in your answer script. Combine all materials in a single pdf file with file name: AMS2320 Assignment 2 - [Student ID].pdf (e.g. AMS2320 Assignment 2 - s123456.pdf) and submit it via Moodle.
Question 1. Is there structural break in the price level data?
It is suggested that there is a structural break in the price level. The period before 2007 is one regime. The period 2007 and after is another. Consider the model
CPI = β0+ β1(Year) + β2D + β3(D)(Year) .
where D is a dummy variable: 0 for before the break and 1 for after. We want to test if the so-called structural break is significant. We have the following computer output:
Full model
|
d .f .
|
SS
|
MS
|
F
|
Sig .
|
Regression
|
3
|
3047 .5
|
1015 .833333
|
4353 .571429
|
2 .11752E-10
|
Residual
|
6
|
1 .4
|
0 .233333333
|
|
|
Total
|
9
|
3048 .9
|
|
|
|
Reduced model
|
d .f .
|
SS
|
MS
|
F
|
Sig .
|
Regression
|
1
|
3012 .148485
|
3012 .148485
|
655 .6787599
|
5 .80084E-09
|
Residual
|
8
|
36 .75151515
|
4 .593939394
|
|
|
Total
|
9
|
3048 .9
|
|
|
|
Test if the structural break is significant. [20 marks]
Question 2. We consider the data (prostate.txt) from the study of Stamey (1989). It was a study on 97 men with prostate cancer who were about to receive a radical prostatectomy (an operation).
The relationship between the level of prostate-specific antigen and a number of clinical measures were studied.
Variable Meaning
X1 lcavol log(cancer volume)
X2 lweight log(prostate weight)
X3 age age
X4 lbph log(benign prostatic hyperplasia amount)
X5 svi seminal vesicle invasion
X6 lcp log(capsular penetration)
X7 gleason Gleason score
X8 pgg45 percentage Gleason scores 4 or 5
Y lpsa log(prostate specific antigen)
(a) [20 marks] Consider the full model
Y = β0 + β1X1 + ... + β8X8 + Error .
Here, the error terms are assumed to be independent and identically distributed random variables N(0,σ2 ) . Suppose that the value of X of a new patient is given as follows
X1 = 1 .1474025,
|
X2 = 3.4194,
|
X3 = 59,
|
X4 = -1.386294,
|
X5 = 0,
|
X6 = -1.38629,
|
X7 = 6,
|
X8 = 0
|
You are interested in predicting Y , the logarithm of amount of prostate specific antigen. Give the predicted value of Y and express the variance of prediction error in terms of σ2.
(b) [20 marks] Now, you do not want a model with eight predictors as the prediction error is not satisfactory. You prefer a model with only five predictors. Select a model with BACKWARD selection approach. Report the t statistic of each remained variables in each step.
(c) [20 marks] Now, you prefer a model with only four predictors. Select a model with FORWARD selection approach. Report the t statistic of each unselected variables in each step.
(d) [20 marks] Consider the reduced model obtained in part (c). You are interested in predicting Y , the logarithm of amount of prostate specific antigen. The value of X for a new patient is given in (a). Give the predicted value of Y and express the variance of prediction error in terms of σ 2 . Comparing (a) and (d), which model tends to give smaller estimation error?
DUE DATE: 11th, April, 2025 (Friday).