代做STATS 779: Professional Skills for Statisticians 2018代做R编程-留学生作业帮

代做STATS 779: Professional Skills for Statisticians 2018代做R编程

Department of Statistics

STATS 779:

Professional Skills for Statisticians

Test: May 29, 2018

2:00 pm–6:00 pm.

INSTRUCTIONS

* Total marks = 90.

* Attempt all questions.

* Note: Some questions are open-ended and it may not be clear how extensive your answer should be. Do not write long answers to these questions. You should be able to answer any question of this type in a few paragraphs at most, or within half a page.

1 The National Identity Card (NIC) number of individuals in Sri Lanka has ten unique characters. Positions 1–9 are numerical and position 10 is an alpha character. The following numbering system is used to define the first five characters:

• Positions 1–2: the year of birth. For example, 81 indicates that the birth year is 1981.

• Positions 3–5: the number of the day in the year on which the person’s birth date falls. A male would be assigned the number 1–366 and a female the number 501–866. For example, a male born on 5 January is represented by 005; a female born on the same day is represented by 505.

Example: The first five characters of the NIC for a male born on 5 January 1981 would be 81005; a female born on that same date would be 81505.

Note: The column C shows the number of the day in the year on which the person’s birth date falls. A number between 1–366.

Write down the Excel worksheet formula(s) to be entered in:

a cell B2 that extracts the birth year of the individual from the given NIC number. For example, the output in cell B2 should be 1999.

b cells D2 and E2 to obtain the birth month and day, respectively.

c cell F2 to obtain the date of birth. The output in cell F2 should follow the dd/mm/yyyy format.

d cell G2 to obtain the gender (i.e., FEMALE vs MALE).

General Tips: You will need to use the following Excel functions:

LEFT and MID functions are used to extract one or more characters from a string, either starting from the left-hand side, middle, respectively, of the string. The syntaxes of the functions are:

LEFT(text, [num_chars])

MID(text, start_num, num_chars)

text Required. The text string that contains the characters you want to extract from.

start_num Required. The position of the first character you want to extract.

num_chars Optional for the LEFT function. Specifies the number of characters you want.

The VALUE function is used to convert a text string that represents a number to a number. The syntax of the function is VALUE(text) where:

text Required. Text enclosed in quotation marks or a reference cell containing the text you want to convert.

MONTH and DAY functions can be used to find the birth month and day of the individual. The syntaxes of the functions are MONTH(serial) and DAY(serial) where:

serial Required. A number in the date-time code.

CONCATENATE is used to join several text strings into one text string. The syntax of the function is CONCATENATE(text1, [text2], ...) where:

text1 Required. text1, text2, ... are 1 to 255 text strings to be joined into a single text string and can be text strings, numbers, or single-cell references.

[10 marks]

2 Amanda learned in her second year about the non-technical interpretation of the 95% confidence interval of the mean.

If we compute a 95% confidence interval of the mean for each sample taken from the population, then 95% of the intervals will capture the unknown population mean.

Amanda wants to visualize this as in Figure 1. You have been asked to help her with writing appropriate R code. Partial code is shown in Figure 2.

Use the given variable names to write R commands:

a In lines 20 and 23, to compute the upper and lower confidence limits, respectively, of each sample generated.

Hint: The 95% confidence interval (assuming a Gaussian distribution) is given by

where ¯x and s are sample mean and standard deviation, respectively, of n observations, α is the significance level, and tα/2,n−1 is the t-critical value from the t-distribution with n − 1 degrees of freedom.

b In line 26 to plot a blue vertical line for the population mean.

c In line 29 to annotate the line drawn in part 2b.

Hint: The mtext function is useful.

d In lines 32–40 to draw the confidence interval for each sample. Set col = "gray" if the confidence intervals capture the unknown population mean and set col = "red" other-wise.

[11 marks]

Figure 1: Non-technical explanation of the 95% confidence interval.

Figure 2: Partial R code.

3 A general system of m linear equations with n unknowns can be written in matrix notation as

Ax = b

where A is an m × n matrix of coefficients, x is an n × 1 vector of unknowns and b is an m × 1 vector of constants.

If the matrix A is square (i.e., m = n) and has full rank (i.e., determinant of the matrix A is non-zero), then the system has a unique solution given by

x = A−1 b.

An incomplete R function is given in Figure 3. Fill the appropriate R commands in lines 4, 8, 12, and 16.

Hint: You can use the det function to find the determinant of matrix A.

[5 marks]

4 Tom and Jerry have been tasked to count the number of times the word “as” appears in a given .txt file. Tom found that there are 31 matches, but is not willing to show his regex pattern. Jerry found 72 matches by setting pattern = "[aA]s(\\s|$)" in the gregexpr function.

The lecturer also said that Tom’s answer is correct.

a Write R code which uses a regular expression to find the correct number of occurrences of the word “as”. Assume that contents of the .txt file have been read into a character vector called lines.

1 # Amat: matrix of coefficients

2 # Bmat: vector of constants

3 leqDir <- function(Amat, Bmat) {

4 if() {

5 stop("Dimensions of A and b don't match")

6 }

8 if() {

9 stop("A should be a square matrix")

10 }

12 if() {

13 stop("A is a rank deficient matrix")

14 }

16 x <-

17 return(x)

18 }

Figure 3: An incomplete R function to solve a system of linear equations.

b Explain to Jerry why his regex pattern did not work (write a maximum of 3 lines). Suggest a few possible mismatches which could have occurred.

c Extend Jerry’s regex pattern to extract all 72 words that Jerry obtained.

[7 marks]

5 Write complete LATEX code to produce the following slides with overlays in beamer. You will have to use \clubsuit (♣), \spadesuit (♠), \heartsuit (♥), and \diamondsuit (♦) which are mathematical symbols.

Figure 4: Beamer slides with overlays.

[7 marks]

6 a In Microsoft Word, styles can be paragraph styles or character styles (plus some other types of style).

What additional features of a paragraph are specified by a paragraph style. other than those specified by a character style? State as many as you can.

b Give at least three examples of where field codes can be used in a Word document.

c Describe 3 ways in which you can produce the symbol Ω in a Word document.

[6 marks]

7 Give four examples of features in RStudio which provide support for editing R code, and explain why they are useful.

[3 marks]

8 a Write LATEX code to produce the following paragraph including the equation and reference. Equation 1 is an example of a commonly occurring format.

b Write LATEX code to produce the following sentence:

During the period of the global financial crisis, 2008–2010, the change in the Dow-Jones index was around −2000 points—a change of some 20%.

c Write LATEX commands to produce Table 1 including the caption.

Table 1: Table Example

d Write bibTEX entries to be included in a .bib file to produce the following bibliography:

References

D. Bates, M. Mächler, B. Bolker, and S. Walker. Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67:1–48, 2015.

B. Manly. Stage-structured Populations: Sampling, Analysis and Simulation. Chap-man & Hall, New York, 1990.

[16 marks]

9 Suppose you have a data frame. called Titanic1 giving details of adult passengers and crew who sailed on the Titanic.

The columns in the data frame. are shown in the following output:

> str ( Titanic1 )

'data . frame. ': 16 obs . of 4 variables :

$ Class : Factor w / 4 levels " 1 st " ," 2 nd " ," 3 rd " ,..: 1 2 3 4 1 2 3 4 1 2 ...

$ Sex : Factor w / 2 levels " Male " ," Female " : 1 1 1 1 2 2 2 2 1 1 ...

$ Survived : Factor w / 2 levels " No " ," Yes " : 1 1 1 1 1 1 1 1 2 2 ...

$ Freq : num 118 154 387 670 4 13 89 3 57 14 ...

Write R code to produce Figure 5 using ggplot2.

[8 marks]

Figure 5: Survival on the Titanic

10 Suppose that in a .Rnw file you have created an R object using xtable called xtbl as shown below:

> class(xtbl)

[1] "xtable" "data.frame"

> str(xtbl)

Classes 'xtable' and 'data.frame': 5 obs. of 1 variable:

$ x: int 34 40 15 10 1

- attr(*, "caption")= chr "xtable example"

- attr(*, "label")= chr "tab:xtbl"

- attr(*, "align")= chr "r" "r"

- attr(*, "digits")= num 0 2

- attr(*, "display")= chr "s" "d"

What will be the effect of the following snippets of text when the .Rnw file is processed using knitr and pdfLATEX:

a <>=

xtbl

b <>=

xtbl

NOTE: You may wish to examine the help pages for the package xtable before answering this question.

[6 marks]

11 The results of this years Giro d’Italia cycle tour race are in the file GiroResults.csv, which has the form. shown in Figure 6.

Figure 6: Top of GiroResults.csv

Rider names are not more than 30 characters long, and team names are not more than 50 characters long. In the column headed Time the first entry (for Chris Froome) gives the total time taken to ride the the 21 stages of the tour, in hours, minutes and seconds. The other figures in that column are the additional times that the various riders took to complete the tour. So for example, George Bennett of New Zealand took an additional 13 minutes and 17 seconds compared to Froome, that is, his total riding time was 89 hours, 15 minutes and 56 seconds.

a Write MySQL code to create a table called giro for this data set. Do not create an automatically incremented variable as the primary key for the data. Instead specify the rider name as the primary key.

b Write MySQL code to read the data from GiroResults.csv into the table giro.

c Alter the table giro by adding a TIME variable called Difference.

d Update the column by first setting Difference to be equal to the Time column and then update the first element of the Difference column (the entry for Froome) to take the value ’00:00:00’.

If this has been done correctly then the Difference column will contain all the time differences from Froome’s time.

e Write MySQL code to produce a table showing the average time difference by team, in minutes rounded to 2 decimal places, ordered from smallest to largest.

NOTE: To carry out calculations involving times, first convert times to seconds by ap-plying the function TIME_TO_SEC.

[10 marks]

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030

联系我们

EMail: 99515681@qq.com

QQ: 99515681

留学生作业帮-留学生的知心伴侣！

工作时间：08:00-21:00

微信客服：codinghelp

热门主题

课程名