代做STAT 314 Assignment 5 2024调试数据库编程

STAT 314 Assignment 5 2024

Question One: Optimising a Metropolis algorithm with a random walk jumping distribution (8 marks)

Computer game developers are trialling a new game. For this project, they recruited 80 participants, 40 female (sex = 1) and 40 male (sex = 0), each of whom were graded of their level of gaming experience (level: 1-4 representing minimal to high). The participants were provided with the game, and the number of times y they played the game was recorded. You are asked by the researchers to fit generalised linear models to the data, provided by you on LEARN as gaming.csv, with sex as a categorical predictor and gaming as a continuous predictor.

Note, in generalised linear models, rather than estimating effects from the response data directly, we model through a link function, η(θ), and assume η(θ)i = x′iβ. The link function can be determined by re-arranging the likelihood of interest into the exponential family format,


As the response variable is a count, the game developers suggesting fitting a Poisson regression. The Poisson pmf is

The Poisson probability mass function can be re-arranged into the exponential family format as follows,

which indicates the link function is η(λ) = log(λ). Hence when fitting the Bayesian Poisson regression, assume that log(λi) = x′iβ.

Fit a Bayesian Poisson regression using Metropolis sampling. Assume the prior for β is N (0, 100Ip), i.e. independent with zero mean and variance = 100.

To get certain quantities for the proposal distribution and predictor, fit the glm shown below:

gaming= read.csv("gaming.csv",header=TRUE)

mod<-glm(y~as.factor(sex)+level,data=gaming,family='poisson')

X=model.matrix(mod)

Sigma=vcov(mod);Sigma

From the glm, extract the design matrix X. For the proposal distribution, use a Normal distribution with mean θ (t−1) and variance-covariance matrix c 2ˆΣ where Σ is the variance-covariance matrix from the glm fit. Consider three candidates for c, 1.6/ √p, 2.4/ √p, 3.2/ √p, where p is the number of parameters estimated. Run the Metropolis algorithm for 10000 iterations, and discard the first 5000. Report the following:

• Check each resulting chain converges to the same distribution. To do this, you may find installing the R package coda helpful, and providing graphical summaries helpful. (2 marks)

• The proportion of candidate draws that were accepted (1 mark).

• The effective sample size for each chain. (1 mark)

• Based on your results to the previous bullet points, what do you think is the best choice for c. Does it approximately match results stated in class on efficiency and optimal acceptance rate? (2 marks)

There are 2 marks available for correctly written the code for implementing Metropolis sampling in this problem.

Question Two: Sequential analysis (7 marks)

The Bayesian way of thinking readily lends itself to combining information from sequential experiments. To demonstrate, consider the following data extracted from the HealthIron study.

Serum ferritin levels were measured for two samples of women, one of C282Y homozygotes (n = 88) and the other of women with neither of the key mutations (C282Y and H63D) in the HFE gene, so-called HFE ‘wildtypes’(n = 242). The information available is

• idnum: Participant id.

• homc282y: Indicator whether individual is Homozygote (1) or Wildtype (0).

• time: Time since onset of menopause, measured in years.

• logsf: The natural logarithm of the serum ferritin in µg/L.

The data required to answer this question is Hiron2021.csv, which can be downloaded from LEARN.

a) Fit a standard linear regression,

E(logsf) = β0 + β1time

with responses restricted to only those who are homozygote (homc282y = 1). This can be done using the lm function in R. Report the estimated coefficients ˆβ, estimated error variance, ˆσe 2 and (X′X) −1 . State what priors would need to be assumed for β and τ = (σe 2 ) −1 such that estimates reported here are the parameters of the posterior distribution of β, τ . [1 mark]

b) The sequential analysis. For this, you will fit a Bayesian regression using a Gibbs sampler to only the wildtype (homc282y=0) data. However, you must use priors for β and τ such that your results will be equivalent to fitting a linear model to all the data available. These priors, and resulting conditional posteriors are shown below.

The conditional posteriors implied by using priors taken from the output in a) are very similar to the case in the later part of the Week 11-12 lectures when the prior for β is N (β0, σ2βK). To remind you of this, this is slides 16-20 in the Week 11-12 lecture, which are roughly re-produced below:

Assume the general prior for β, β ∼ N (β0, σ2βK), where K is an arbitrary variance-covariance matrix. Then we can deduce the result for various special cases.

• To make derivations easier, we will work with inverse variances (τ = (σ 2 ) −1 ) rather than variances, and assume inverse variance components a priori are distributed Ga(αi , γi). Further, we will assume that K is known and does not need to be estimated.

• The likelihood p(y|β, τe, X) is

• The priors are

• The joint distribution p(y, β, τe, X, τβ) = p(y|β, τe, X)p(β|, τβ, K)p(τβ|αβ, γβ)× p(τe|αe, γe) is

• As the posterior distribution is proportional to the joint distribution, the task of determining conditional posteriors is equivalent to determining the distribution kernel for the parameter(s) of interest.

• The component of the joint distribution that is a function of τe is,

which corresponds to a gamma kernel, such that

p(τe|y, β, X) = Ga(αe + n/2, γe + (y − Xβ) ′ (y − Xβ)/2).

• The component of the joint distribution that is a function of τβ is,

which corresponds to a gamma kernel, such that

p(τβ|y, β, K) = Ga(αβ + p/2, γβ + (β − β0) ′K−1 (β − β0)/2).

• The component of the joint distribution that is a function of β is,

which corresponds to a normal kernel, such that p(β|y, X, β0, K, τe, τβ) is multivariate-normal with mean =(τeX′X + τβK−1 ) −1 (τeX′y + τβK−1β0) and variance-covariance matrix (τeX′X + τβK−1 ) −1 .

Note: If x is multivariate normal N(µ, Σ), the kernel is e − 2/(x−µ) ′Σ −1(x−µ) ∝ e − 2/µ′Σ−1µ eµ′Σ−1x

In the specific case,of analysing the HealthIron study sub-group by sub-group, β0 = βˆ from a, σ2β fixed and K = σ 2 (X′ 1X1) −1 , which in turn means τe = τβ. Note by writing the prior in this form, we have made β conditional on σ 2 or equivalently τ = (σ 2 ) −1 . Note y1, X1, ˆβ1, ˆσ2 1 refers to results from part a), the same quantities from the subset where homc282y=0 is not sub-scripted and τ = (σ 2 ) −1

The conditional posterior for τ is a little more complicated than that shown in the lecture notes, as we have made the prior for β conditional on τ , however the prior for τ is still Ga(α, γ) with α = (n1 − p)/2 and β = (n1 − p)ˆσ 2 1/2. To determine the posterior for τ , consider the modified joint distribution,

from which we can deduce the conditional posterior for τ is,

• For the Gibbs sampler, run three chains for 10000 iterations. Discard the first 1000 iterations as burn-in and then remove every second remaining iteration to reduce auto-correlation. Check that chains have converged. When storing results, convert τ back to σ 2 . Report posterior means, standard deviations and 95 % central credible intervals (including interpretation) for β0, β1, σ2 combining results for the three chains. If you want to check your coding is correct, compare your results to fitting a linear model to all data in the HealthIron study.

Marks are as follows.

• 2 marks for coding the Gibbs sampler correctly.

• 1 mark for posterior means, standard deviation, credible intervals.

• 1 mark for interpreting credible intervals correctly.

• 2 marks for checking convergence.





热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图