Programme
|
MSc Applied Statistics (Biostatistics Pathway)
|
Module Title
|
Generalized Linear Model
|
Module Code
|
APH420
|
Assignment Title
|
Coursework 1
|
Assignment 1: Generalized Linear Regression Models
Problem 1. (30 marks) Consider the following simple linear regression model
yi = β0 + β1 xi + ϵi , i = 1, 2, . . . , n
with ϵi’s are independent with common distribution N(0,σ2 ). Assume that not all xi’s are equal.
1. Justify that the least squares estimator of (β0 ,β1 )T is given by
(1)
where
(5 marks)
2. Justify that the maximum likelihood estimator of (β0 ,β1)T is given by (1). (5 marks)
3. Let µi = β0 + β1xi
. Suppose that the fitted values = (1, . . . , n)⊤ is given by
with 0, 1 given by (1). Show that µb is normal distributed with mean equal µ, i.e., an unbiased estimator of µ. (5 marks)
4. Show that the unbiased estimator of σ
2
is given by SSE/(n − 2) such that SSE/σ2 ∼ χ2(n − 2) with (5 marks)
5. Construct a F-test for null hypothesis: β = β0 with known at a significance level α. (10 marks)
Problem 2. (20 marks) Consider an inverse Gaussian distributed risk Y , which has probability density function
where µ > 0 and ψ > 0.
1. Show that the inverse Gaussian distribution is an exponential dispersion distribution by identifying the canonical parameter θ , b(θ) and dispersion ψ . (6 marks)
2. Show that the variance-mean function is ν(µ) = µ3 . (4 marks)
3. Determine the canonical link function. (4 marks)
4. Deduce the unit deviance. (6 marks)
Problem 3. (25 marks) Suppose we are studying the effectiveness of a new medication for treating a particular illness. Let Y be a Bernoulli random variable with value of 1 indicating that the new medication is effective and 0 otherwise. Denote by Y1,..., Yn the observed sample of the effectiveness for a sample of n patients. Let
πi = P (Yi = 1) ∈ (0, 1),
the probability may be related to potential explanatory variables, such as the patient’s age, severity of the condition, or other health indicators.
1. Suppose the potential explanatory variables x = (x1 , x2 )T include the patient’s age and the severity of the condition (0: severe, 1: not severe). Build a logistic regression model to show the relationship between πi and xi. (5 marks)
2. Show the log-likelihood function of β0 ,β1 ,β2 explicitly. Derive the normal (regular) equations that the maximum likelihood estimates of β = (β0 ,β1 ,β2 )T must satisfy. Find the conditions such that the maximum likelihood estimates of β exist uniquely. (10 marks)
3. Consider an alternative model
π (xi ) = 1 - exp[-exp(β0 + β1 xi1 + β2 xi2)] .
Show that this is a generalized linear model with a specific link function. Give an interpretation of β1 . (10 marks )
Problem 4. (25 marks) In a study of the relationship between hypertension and sleep apnoea-hypopnoea (breathing difficulties while sleeping), a logistic regression model was fitted. The dependent variable was the presence of hypertension. The independent variables were dichotomized as follows.
• Age: 0 for 10 years or under, and 1 otherwise.
• Male: 0 for females, and 1 for males.
• BMI: 0 if body mass index (kg/m2 ) is in a normal range (18.5,25), and 1 otherwise.
• Apnoea-hypopnoea index: 0 if fewer than ten events per hour of sleep, and 1 otherwise.
Age, Male and BMI (body mass index) are extraneous variables. The fitted model is summarized in Table 1.
Table 1: The logistic regression model fitted to data relating hypertension to sleep apnoea-hypopnoea.
1. Write down the fitted model. (5 marks)
2. Use a Wald test to show if βj = 0 for each independent variable at a significance level α = 0.05. (5 marks)
3. Find 95% confidence intervals for each regression parameter. (5 marks)
4. Compute and interpret the odds ratios for each independent variable. (5 marks)
5. Predict the mean probability of observing hypertension in 30 year-old males with a BMI of 25.5 kg/m2 who have an apnoea-hypopnoea index value of 5. (5 marks)