代写STA303 Winter 2025 Practice Exam代做Java语言

STA303 Winter 2025 Practice Exam

Wednesday 26 March 2025

1 Short answer

For each of the statistical problems below (itemized with letters) assign a phrase from the second list (itemized with numbers).  The second (numbered) list is longer than the first, some items in the second do not have a partner in the first.

Problems:

a.  How do you evaluate the conditional density and likelihood of a generalized linear mixed model?

b. Why do matched case control studies work?

c.  How can I fit a GLM?

d.  I want a CI for a parameter when the MLE is on a boundary.  A score test can be used here, how does a score test work?

e. Why are spline functions  (B-splines, cubic splines or thin plate splines) used in generalized additive models?

Solutions:

1.  The true parameter should be close to the MLE, so the derivative evaluated at the true parameter should be close to zero.

2.  Conditioning on total within a strata, independent Poissons conditional on sums are multinomial, normalizing Poisson means makes some unwanted terms cancel.

3.  Taking a second order Taylor series expansion evaluated at the maximum, first derivative is zero, log density is quadratic, treat things as Gaussian.

4.  It is a method for finding patterns in data and using them to make predictions or decisions without being explicitly programmed for each specific task.

5.  Fitting a model to your data, then repeatedly simulating new datasets from that model to estimate the variability or uncertainty of your estimates.

6.  Set model parameters so that the expected moments calculated from the model are the same as the average moments from the dataset

7.  A smooth curve or integrated Wiener process can be approximated by a linear combination of basis functions.

8. Writing a function to evaluate the likelihood, using automatic differentiation  (i.e. with TMB) to get derivatives, use a numerical optimizer to find maximum

2 Roads

Figure 1:  Monthly deaths of passengers on motorcycles in the UK

Below are data on road accidents in the UK from

https://data.gov.uk/dataset/road-accidents-safety-data,

which contains 10.5 million individual records. In this analysis, cases are defined as passengers on motorcycles (not the drivers) killed in a traffic accident.  Figure fig. 1 shows over time, for males  (M) and females  (F)

separately, the number of monthly deaths.  It is hypothesized that most motorcycle passengers are being driven by their partner or spouse, which is in most cases someone of the oposite sex (male passengers were riding behind female drivers and vice versa).  Deaths have been falling, which is believed to be because of better safety technology in road vehicles.

The bikes object is a data.frame with the numbers of deaths for Males and Females, the date column is the month (or more technically the first day of the month) corresponding to the death count.

bikes[1:4, ]

date Male Female

1 1979-01-01   0     0

2 1979-02-01    4     1

3 1979-03-01   5     1

4 1979-04-01  10     3

Here’s a bunch of code to format time variables. bikes$date is a Date object, which stores dates as number of days since 1 January 1970.

bikes$Ndays = Hmisc::monthDays(bikes$date)

bikes$logDays = log(bikes$Ndays)

bikes$timeNumeric = as.numeric(bikes$date)

bikes$timeScaled = (2 * pi/365.25) * bikes$timeNumeric

bikes$sin12 = sin(bikes$timeScaled)

bikes$cos12 = cos(bikes$timeScaled)

bikes$sin6 = sin(2 * bikes$timeScaled)

bikes$cos6 = cos(2 * bikes$timeScaled)

bikes[1:3, ]

date Male Female Ndays  logDays timeNumeric timeScaled        sin12

1 1979-01-01    0     0   31 3.433987       3287  56.54437 -0.004300593

2 1979-02-01    4     1    28 3.332205       3318  57.07764 0.504648295

3 1979-03-01    5     1   31 3.433987       3346  57.55931 0.847173337

cos12        sin6      cos6

1 0.9999908 -0.008601106 0.9999630

2 0.8633250  0.871351004  0.4906602

3 0.5313166  0.900234525 –0.4354053

The following code has been used to fit a model to the data.

library(ImgcvI)

resM = gam(Male ~ cos12 + sin12 + cos6 + sin6 + offset(logDays) +

s(timeNumeric, k=100),

data=bikes, family=nb(link=IlogI))

resF = gam(Female ~ cos12 + sin12 + cos6 + sin6 + offset(logDays) +

s(timeNumeric, k=100),

data=bikes, family=nb(link=IlogI))

1. Write down a statistical model corresponding to the model fit and explain the terms and variables in the model. Write in Human language and mathematics, not R.

2. Write down figure sub-caption (from a to f) and a caption for Figure fig. 2, using (where appropriate) mathematical notation from your model description.

3. Write a paragraph summarizing what you have learned about trends in motorcycle accidents from this analysis. Use non-technical language and refer to figures and tables as appropriate.

Code for Figure fig. 2

bsMat = predict(resM, type = "lpmatrix")

simCoefM = mvtnorm::rmvnorm(20, coef(resM), vcov(resM))

seasonVar = grep("^(sinIcos)", colnames(simCoefM))

simCoefF = mvtnorm::rmvnorm(20, coef(resF), vcov(resF))

matplot(bikes$date, exp(tcrossprod(bsMat, simCoefM)), type = "l",

lty = 1, log = "y")

matplot(bikes$date, exp(tcrossprod(bsMat, simCoefF)), type = "l",

lty = 1, log = "y")

matplot(bikes$date, exp(tcrossprod(bsMat[, seasonVar],

simCoefM[, seasonVar])), type = "l", lty = 1, log = "y")

matplot(bikes$date, exp(tcrossprod(bsMat[, seasonVar],

simCoefF[, seasonVar])), type = "l", lty = 1, log = "y")

matplot(bikes$date, exp(tcrossprod(bsMat[, seasonVar], simCoefM[,

seasonVar])), xlim = as.Date(c("2009/12/15", "2011/1/15")),

type = "l", lty = 1, log = "y")

matplot(bikes$date, exp(tcrossprod(bsMat[, seasonVar], simCoefF[,

seasonVar])), xlim = as.Date(c("2009/12/15", "2011/1/15")),

type = "l", lty = 1, log = "y")

Figure 2:  Stuff

3 Smoking

The 2014 American National Youth Tobacco Survey was analyzed to explore how urban/rural differences, age and sex affect smoking tobacco from a pipe.  The research hypotheses to be investigated using this survey are as follows.

1.  The group of high school students most like to smoke tobacco from a hookah or waterpipe is older white boys in urban areas; and

2.  Urban girls of any age are much less likely to have smoked from a waterpipe than urban boys of a comparable age.

Your  task  is  to  fill  in  the  “methods”  and  “results”  section  of a  short  report  on  smoking  waterpipies  in highschool. I asked Deepseek to write a one paragraph introduction and it said

The use of waterpipes (also known as hookahs) for smoking tobacco has become an increasing pub- lic health concern, particularly among adolescents.  While cigarette smoking has declined in some  youth populations, waterpipe use remains prevalent, potentially due to misconceptions about its  safety and social acceptability. Understanding demographic patterns in waterpipe smoking—such  as differences by urban/rural location, age, and sex—is critical for targeted prevention efforts.

This report analyzes data from the 2014 National Youth Tobacco Survey (NYTS) to investigate two key hypotheses:  (1) that older white boys in urban areas are the most likely subgroup to smoke tobacco from a waterpipe, and (2) that urban girls are significantly less likely to engage in this behavior compared to urban boys of the same age.  By examining these trends, the report aims to inform policies and interventions aimed at reducing youth waterpipe use.

Your task is to write a “methods” and “results” section as follows:

•  a couple of paragraphs on ‘methods’ giving the statistical models used (in mathematical notation, not R syntax) and explaining why they are appropriate (or why they are not if you believe that to be the case); and

•  a ‘results’ paragraph or two where the results are described and interpreted. You can refer to imaginary tables and figures, if you do write captions for them after your text.

I’ll ask Deepseek to write a short conclusion, then send your report off to the US Department of Education with an invoice for a $250k consulting fee. I’ll give you 10% of whatever they end up paying me.

The report will be assessed in terms of:

•  clarity of presentation,

•  demonstration of an understanding of the statistical models used, and

•  drawing conclusions which are consistent with the analysis.

You may refer to the output below, if you deem it to be useful.  The variable ever_hookah_or_waterpipe is coded as true if the respondent had ever smoked tobacco from a hookah or water pipe.  The baseline race is white.

smokeSub = as.data.frame(smoke[smoke$Age > 10 & !is.na(smoke$Race),

])

smokeSub$ageFac = factor(smokeSub$Age)

smokeSub$Sex = relevel(smokeSub$Sex, "F")

smokeSub[1:3, c("Sex", "ageFac", "Race", "RuralUrban", "ever_hookah_or_waterpipe")]

Sex ageFac     Race RuralUrban ever_hookah_or_waterpipe

1   M    16 hispanic      Rural                  FALSE

2   M    14 hispanic      Rural                  FALSE

3   M    14 hispanic      Rural                  FALSE

library(glmmTMB)

resSmoke = glmmTMB(ever_hookah_or_waterpipe ~ ageFac * Sex +

Sex * RuralUrban + Race + (1 | school), data = smokeSub,

family = binomial())

knitr::kable(confint(resSmoke, full = TRUE), digits = 2)

Use the predict function to get the probability, in percent, that a  15 year old has smoked a hookah by sex/urban  group  for the  baseline  race.   The  argument  re.form=NA specifies that  random  effects  are  not included in the predictions.

toPredictR = na.omit(expand.grid(Sex = unique(smokeSub$Sex), RuralUrban = unique(smokeSub$RuralUrban), ageFac = "15",

Race = "white"))

smokePredR = predict(resSmoke, toPredictR, se.fit = TRUE,

re.form. = NA)

smokePredR = cbind(toPredictR[, c("Sex", "RuralUrban")],

100 * exp(data.frame(est = smokePredR$fit, lower = smokePredR$fit -

2 * smokePredR$se.fit, upper = smokePredR$fit +

2 * smokePredR$se.fit)))

knitr::kable(smokePredR, digits = 2)

Estimate and plot probability of using a pipe as a function of age.

toPredictAge = na.omit(expand.grid(Sex = unique(smokeSub$Sex), ageFac = unique(smokeSub$ageFac), RuralUrban = "Urban",

Race = "white"))

smokePredAge = predict(resSmoke, toPredictAge, se.fit = TRUE,

re.form. = NA)

smokePredAge = data.frame(age = toPredictAge$ageFac, sex = toPredictAge$Sex,

est = smokePredAge$fit, lower = smokePredAge$fit - 2 *

smokePredAge$se.fit, upper = smokePredAge$fit +

2 * smokePredAge$se.fit)

The  smokePredAge data  frame  is  reformatted  and  100*exp(smokePredAge) is  plotted  with  matplot to produce the graph below.

A likelihood ratio test

resSmoke2 = glmmTMB(ever_hookah_or_waterpipe ~ ageFac *

Sex + Sex + RuralUrban + Race + (1 | school), data = smokeSub,

family = binomial())

lmtest::lrtest(resSmoke, resSmoke2) Likelihood ratio test

Model 1: ever_hookah_or_waterpipe ~ ageFac * Sex + Sex * RuralUrban + Race + (1 | school)

Model 2: ever_hookah_or_waterpipe ~ ageFac * Sex + Sex + RuralUrban +

Race + (1 | school)

#Df  LogLik Df  Chisq Pr(>Chisq)

1 26 -4187.0

2  25 -4187.1 -1 0.0094    0.9226


热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图