代做STA475 Assignment #2 (Fall 2024)代写C/C++程序

STA475 Assignment #2 (Fall 2024)


Due date: Friday October 11th at 11:59pm

•  No-questions-asked grace period until Monday October 14th at 5pm

•  No late submissions will be accepted after this time

Where to submit:

•  Crowdmark:  Submit your answers to each question in correct space on Crowdmark

•  You can submit as many times as you like before the deadline

•  Email submissions will NOT be accepted

Other notes:

•  For some questions, you will need to use R; please include all code and relevant output in the pdf submission (make sure the code is visible in the pdf and doesn’t run out of the margins).  If the question asks you to answer a question based on R output, make sure your answer is easy for your TA to find (i.e. start with your answer, then have the code and output afterwards for reference).  Your TA should be able to understand your answer without looking at your and output (as appropriate), but may refer to these to better understand what you did.

•  While you may discuss questions with your classmates, you MUST submit independent work that you did yourself.  Students submitting identical solutions  (e.g. identical sen- tences, derivations steps, or chunks of code) will be investigated for violations of academic integrity.

•  If   you   believe   you’ve   found   a   typo   or   error   in   the   assignment,   please   email sta475@utoronto.ca  so  I  can  look  into  it  and  get  back  to  the  class  as  quickly  as possible.

Question 1 [8 points]

In this question, you will simulate time-to-event data with one binary predictor, where the log-rank test and Wilcoxon test yield different results at the 5% significance level.

(a) Simulate survival data for two groups (e.g. treatment and control), with at

least n = 100 in each group, where the log-rank test and Wilcoxon test yield

different results at the 5% significance level.  Provide the  R code you used to

simulate the data (make sure to include set.seed() at the beginning to ensure reproducibility)

# Set random seed for reproducibility

set.seed(475) # You can change the integer, but leave this line here

# Simulate survival data

(b) Use R to conduct a logrank test and report the p-value for the difference in

survival distributions between groups for the data you simulated in (a)?  Write 1-2 sentences summarizing your findings.

# Perform. logrank test

(c) Use R to conduct a Wilcoxon test and report the p-value for the difference in

survival distributions between groups for the data you simulated in (a)?  Write 1-2 sentences summarizing your findings.

# Perform. Wilcoxon (Breslow) test

(d) Plot the K-M estimates for the two groups in the data you simulated in (a).

(e) Write 2-3 sentences commenting on why the log-rank and Wilcoxon tests

yield different results for the data you simulated in (a), by making specific reference to the plot you produced in (d).

Question 2 [6 points]

The time to recurrence, in months, for patients on an existing and novel treatment for lung cancer is compared using the following model:

Y = log(T) = 1 + 2Z + 0.3W ,

where W has a standard logistic distribution and Z = I(novel treatment is administered).

(a) Derive an expression for P (T > t) in terms of t and z, based on the expression above and the distribution of W.

(b) Use your answer from part (a) to estimate the probability of remaining in remission for more than 1 year, both under the new and standard treatments.

(c) Use your answer from (a) to estimate the probability of remaining in remission for more than 2 years, both under the new and standard treatments.

(d) Write 1-3 sentences carefully interpreting the effect of the new treatment on the median time to remission.

Question 3 [12 points]

A software company is analyzing the time-to-failure of their operating system under different conditions. They fit an AFT model to study the effects of two predictors on the time until the first critical error occurs:  (1) system load (high/normal) and (2) a new optimization algorithm (present/absent). They obtain the following results:

1.  The acceleration factor for high system load compared to normal system load is 1/2.25, controlling for presence or absence of the optimization algorithm.

2.  The acceleration factor for systems using the new optimization algorithm compared to those without it is 1/0.8, controlling for the system load (high or normal)

3.  The  median  time-to-failure  for  systems  under  normal  load  without  the  optimization algorithm is 1000 hours.

Note:  The corrections in red relate to the clarification notes covered on Tues Oct 8th.  These values should lead to answers that make more intuitive sense, in terms of the impact of heavy load vs  normal  load,  and  using  an optimizing  algorithm vs  no optimizing  algorithm.   The original values essentially had the wrong ”baseline” group.  It’s possible that if you leaned on your intuition, your answers may have been correct even with the original values.

(a) What is the response of interest in this problem?

(b) In this context, are longer values of the response variable desirable (e.g. good) or undesirable (e.g. bad)?  Explain your reasoning.

(c) Carefully interpret the effect of high system load on the time to first critical error in 2-3 sentences.  Make sure your answer includes a comment about whether or not high system load accelerates or delays the occurrence of critical errors.

(d) Carefully interpret the effect of the new optimization algorithm on the time to first

critical error in 2-3 sentences.  Make sure your answer includes a comment about

whether or not the new optimization algorithm accelerates or delays the occurrence of critical errors.

(e) Calculate the median time-to-failure for systems under high load without the optimization algorithm.

(f ) If 90% of systems under normal load without optimization survive for 500 hours without a critical error, how long do 90% of systems under normal load with optimization survive?

(g) What is the median time-to-failure for a system under high load with the new optimization algorithm?  Justify your answer by showing your calculations.

(h) The company wants to achieve a median time-to-failure of at least 800 hours for systems under high load.  Can this be achieved with the new optimization algorithm? Explain your reasoning.

Question 4 [9 points]

In this question, you’ll consider the diabetic retinopathy data we looked at in class.  In class, we fit several parametric models to these data, to model the time to blindness in the right eyes of diabetes patients, and here you will use several approaches to determine which model is most appropriate.

eyes <- diabetic |> filter(eye == "right")

(a) Fit weibull, exponential, lognormal, and log-logistic models to these data, with no predictors.

(b) Create one plot with the Kaplan-Meier estimate of the time to blindness in right eyes for these patients, as well as the parametric estimates for the models you fit in part (a).  Write 2-3 sentences summarizing what you learn from this plot.

Hint: You can modify the code from class to create this plot, and you can use the helper functions from class as well (you’ll find them on Quercus).

(c) Create a probability-probability (P-P) plot for each parametric model you fit

in (a). Write 2-3 sentences summarizing what you learn from this plot. Hint: You can modify the code from class to create this plot.

(d) What is one advantage of using the visualization approach from part (b) to choose a parametric model?

(e) What is one advantage of using the visualization approach from part (c) to choose a parametric model?

Question 5 [7 points]




hepatoCellular %>%

select(Number, Age, Gender, OS, Death, RFS, Recurrence) %>%


Rows: 227

Columns: 7

$ Number      1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, ~

$ Age         57, 58, 65, 54, 71, 32, 79, 27, 55, 35, 34, 60, 38, 72, 41,~

$ Gender     0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1,~

$ OS          83, 81, 79, 76, 7, 13, 38, 19, 4, 76, 3, 78, 8, 72, 63, 33,~

$ Death       0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1,~

$ RFS         13, 81, 79, 76, 3, 3, 30, 19, 4, 76, 1, 78, 1, 72, 35, 33, ~

$ Recurrence  1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1,~

(a) Fit a Weibull model to estimate the survival curve for overall survival of

patients with hepatocellular carcinoma based on the hepatoCellular data.  Print a summary of your fitted model.

(b) Fit an exponential model to estimate the survival curve for overall survival of patients with hepatocellular carcinoma based on the hepatoCellular data. Print a summary of your fitted model.

(c) Conduct a likelihood ratio test to determine whether the exponential model is suitable to represent these data, compared to the Weibull model.  State your null and alternative hypotheses (be sure to define any parameters you refer to), write the form of the test statistic, calculate the test statistic and p-value using R, and write 1-2 sentences summarising your findings.

(d) In the likelihood ratio test you conducted above, how many degrees of freedom did you use for the chi-squared distribution?  Explain why in  1-2 sentences.

Question 6 [10 points]

Using the pharmacoSmoking data from the asaur package you will investigate the effect of triple therapy (grp == combination) vs patch (grp == patchOnly) treatments for smoking cessation. The response of interest is time (in days) until relapse of smoking.  We will exclude individuals from this analysis who did not quit smoking for at least one day



Rows: 125

Columns: 14

$ id            21, 113, 39, 80, 87, 29, 16, 35, 54, 70, 84, 85, 25, 47~

$ ttr            182, 14, 5, 16, 0, 182, 14, 77, 2, 0, 12, 182, 21, 3, 1~

$ relapse        0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1~

$ grp             patchOnly, patchOnly, combination, combination, combina~

$ age             36, 41, 25, 54, 45, 43, 66, 78, 40, 38, 64, 51, 37, 65,~

$ gender          Male, Male, Female, Male, Male, Male, Male, Female, Fem~

$ race            white, white, white, white, white, hispanic, black, bla~

$ employment      ft, other, other, ft, other, ft, pt, other, ft, ft, oth~

$ yearsSmoking    26, 27, 12, 39, 30, 30, 54, 56, 25, 23, 30, 35, 23, 50,~

$ levelSmoking    heavy, heavy, heavy, heavy, heavy, heavy, heavy, light,~

$ ageGroup2       21-49, 21-49, 21-49, 50+, 21-49, 21-49, 50+, 50+, 21-49~

$ ageGroup4       35-49, 35-49, 21-34, 50-64, 35-49, 35-49, 65+, 65+, 35-~

$ priorAttempts   0, 3, 3, 0, 0, 2, 0, 10, 4, 10, 12, 1, 5, 6, 5, 2, 1, 1~

$ longestNoSmoke 0, 90, 21, 0, 0, 1825, 0, 15, 7, 90, 365, 7, 1095, 180,~

pharmacoSmoking <- pharmacoSmoking %>%

filter(ttr >= 1)

(a) You will first fit Weibull and lognormal models to these data (overall, not separated by treatment) and produce one or more plots of your choice to identify which of the parametric models is most suitable for these data (among these two choices).  Explain your answer in 1-2 sentences, making specific references to the plot(s) you produced.

(b) Using the parametric model you chose in part (a), fit two separate models, one for

each treatment group. What are the estimated values of a and τ for each group?

(c) Conduct hypothesis test(s) to determine whether the two groups have different survival distributions.  Be sure to formally state your null and alternative hypotheses, report relevant test statistics and p-values, and interpret your conclusions.



mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
EMail: 99515681@qq.com
QQ: 99515681