STATS 763 - 2022 - Midterm test
6 April 2022, 10:00-11:30NZST
Question 1 [21 marks total]
Wilms’ tumour is a rare childhood cancer of the kidney. Treatment is successful for the majority of patients, but a minority do relapse. An important risk factor for relapse is disease stage (how far it has spread).
The following data on all U.S. paediatric Wilms’ tumour patients between
1980 and 1994, inclusively, were collected:
• Year: Year of diagnosis, from 1980 to 1994
• Stage: Disease stage (I [least advanced], II, III, IV [most advance])
• rel5: Relapse within 5 years (0 [No], 1 [Yes]), hereafter called ”relapse” .
We fit a relative risk model of rel5 on Year*Stage, to capture any secular trend in relapses by disease stage, and obtain the following results:
Call:
glm(formula= rel5~Year*Stage, family=binomial(link="log"), data=wilms)
Coefficients:
Estimate Std . Error z value Pr(>|z| )
(Intercept) 52 .59226 38 .93026 1 .351 0 .1767
Year -0 .02767 0 .01960 -1 .412 0 .1581
StageII -107 .15715 52 .12848 -2 .056 0 .0398 *
StageIII 40 .62429 50 .38844 0 .806 0 .4201
StageIV -29 .17571 53.37873 -0 .547 0 .5847
Year:StageII 0 .05422 0 .02623 2 .067 0 .0387 *
Year:StageIII -0 .02008 0 .02537 -0 .792 0 .4286
Year:StageIV 0 .01525 0 .02687 0 .568 0 .5703
Selected rows and columns from the estimated variance matrix of the coef- ficient estimates are given below:
(Intercept) Year StageIV Year:StageIV
(Intercept) 1515 .6 -0 .763 -1515 .6 0 .763
Year -0 .763 0 .000384 0 .763 -0 .000384
StageIV -1515 .6 0 .763 2849 .3 -1 .434
Year:StageIV 0 .763 -0 .000384 -1 .434 0 .000722
(a) [12 marks total]
We replace Year in the model by Year1980 <- Year-1980.
i. [6 marks] What are the values of the new estimates for (Intercept), Year1980, StageIV and Year1980:StageIV?
ii. [6 marks] What are the standard errors of the new estimates for Year1980, StageIV and Year1980:StageIV?
(b) [5 marks]
According to the model, what is the estimated relative risk of relapse between a patient at Stage IV in 1990 and a patient at Stage III in 1980?
(c) [4 marks]
According to the model, what is the estimated diference in log-risk of relapse corresponding to an increase in the year of diagnosis of 5 years for a patient in Stage I, and what is the standard error of this estimate?
Question 2 [16 marks total]
We fit a GLM by solving the score equation Σin= xTiw i(Yi - μi) = 0, where xi is the ith 1 × p covariate vector, μi = g-1(xiβ) and 0 is the 1 × pvector of all zeros. The GLM involves a dispersion parameter φ > 0.
(a) [4 marks]
Explain why wi = 1 when the canonical link is used.
(b) [4 marks total]
What is wi in the following settings?
i. [2 marks] Variance function V (μ) = μ2 , μ ∈ R+ and link function g(μ) = log(μ).
ii. [2 marks] Poisson(μ) family and identity link.
(c) [4 marks]
How do we usually estimate φ? Write an expression for the estimator.
(d) [4 marks]
For a certain value of β0 , you are given the observed values
(this last subscript means “evaluated at β = β0 ”).
Write down a test statistic that you can approximately compare to a χp(2) quantile to test H0 : β = β0 vs H1 : β ≠ β0 .
Question 3 [8 marks total]+4 bonus Answer the following questions:
(a) [4 marks] You have written a scientific paper containing results from a linear regression model E[YjX = x] = xβ fitted to independent count data. The data set was large and you estimated the variance of β(^) using a sandwich estimator.
A reviewer writes that count data are not normally distributed, and there-fore your Wald confidence intervals are incorrect because 1)β(^) is therefore not normally distributed either and b) the variances are wrong because they are estimated under the wrong model. How do you respond?
(b) [2 marks] Describe one situation in which a quasi-likelihood model may fail to produce reliable standard errors.
(c) [2 marks] True or False: Data sampled according to the outcome will yield biased regression estimates unless it is appropriately weighted.
(d) [4 marks] (bonus) A regression coe代cient estimated from a parametric gen-
eralised linear model has covariance matrix Cov(β(^)) = φ(XTX)-1 . Find two combinations of family and link that will yield this covariance.