代写7SSGN110 Environmental Data Analysis | Practical 6 | Time Series Analysis

代写7SSGN110 Environmental Data Analysis | Practical 6 | Time Series Analysis – Time Domain 调试R程序

7SSGN110 Environmental Data Analysis | Practical 6 | Time Series Analysis – Time Domain

1. Introduction and data processing

The aim of this practical is to introduce time series decomposition and autocorrelation.

This case study data are monthly average atmospheric CO2 concentrations measured at the Mauna Loa Observatory, Hawaii, USA, a brief overview of which is provided by Keeling (2008).

Data from Mauna Loa Observatory are available from the National Oceanic and Atmospheric Administration (NOAA) Earth System Research Laboratory Global Monitoring Division website: http://www.esrl.noaa.gov/gmd/obop/mlo/. We have downloaded the relevant data for you and posted this as an Excel file on KEATS. Download and open this file.

As always, the first thing we should do when working with a new data set is to quickly plot the data to familiarise ourselves with it and check for any inconsistencies or potential problems:

· Create a Single Line Plot that displays the value column. To show years on the x-axis, right-click on your graph → Select Data → Edit the horizontal axis → Column B (year)

· Change the scale of the vertical axis so that you can clearly see the full variation of the time series.

You should be able to immediately notice both an increasing trend and a seasonal pattern.

· What do you notice about the pattern early in the time series?

Hopefully, you notice that the early values in the time series do not fit the seasonal pattern as well as the remainder of the time series. Look at the data to see why this is.

The data we downloaded has values for 1969 and 1970 before skipping to 1976 (check you can see this in your raw data). To make our data easier to handle we will work here with data that are for the complete years 1977-2020:

· Remove the rows of data for 1969, 1970 and 1976 (select rows, right-click, delete).

You should now have data for all months for 44 years.

· How could you quickly check you really do have data for all months in these years? [Hint, for 44 years how many data values would you expect? How can you count values quickly to check?]

2. Time series decomposition

2.1. Time series decomposition in Excel

Time series can be understood as being composed as shown by Eq. 1:

Eq. 1

In the following we will see how we can decompose a time series into these components in Excel to help us describe and better understand the data.

a) Trend: Running Mean

One way of estimating a trend in a time series is to use a running mean. This approach calculates the average value of a variable across a sliding window of a specific width (duration). We usually aim for this window to be of equal size above and below the original data point, which is straight-forward for an odd-numbered size window but requires some additional thought when the window is even in size.

For example, for a window of size five the running mean m at time t calculated from values x would be given by Eq. 2:

Eq. 2

Thus, the running mean is calculated from the current value, the two data points before, and the two values after. To ensure an even-sized window is symmetrical we use a slightly different approach. For example, for a window of size six, the running mean would be given by Eq. 3:

Eq. 2

Thus, the running mean is calculated as before except we account for only half the value of the data points at the edges of the window. To create a running mean for the Mauna Loa data we will use a window size of 12. To do this:

I. In column E add a sensible column title for a running mean with window size 12

II. Assuming data values are in column D enter the following formula in cell E8: =SUM(D2/2, D3:D13, D14/2)/12

III. Copy this formula down to cell E523

Make sure you understand what the formula you’ve implemented in Excel is doing.

· Why have we used a window of size 12? How would you change the formula to use window size 13? Why have we stopped at row 523?

To check what your running means looks like, add the data you have created in column E to your existing quick plot of the data (re-create this if you deleted it!):

i. Right-click on the chart

ii. Click Select Data

iii. In Legend Entries (series) click the Add button

iv. Provide series name Running Mean (12)

v. For series values, select the entirety of column E

Your plot should look something like Figure 1.

· Note how the running mean line does not extend to the length of the empirical data. Why is this?

Figure 1. Example plot of CO2 data with running mean.

Check your running mean is centred on the data, with equal number of missing values at start and end of the time series.

b) Periodicity

We now calculate the periodic signal in the data, using the trend data. First, for each observation we calculate the difference between the observed value and the trend (running mean) value. Calculate this difference in column F for all the running mean values (and provide a relevant column header label, e.g. “diff”).

After you have done this, we determine the average difference for each of the 12 months using a pivot table:

I. Select the entire data block (all columns and rows, including the headers)

II. Go to menu: Insert – Pivot Table – From Table/Range, on a New Worksheet

III. On this new PivotTable worksheet on the right-hand side:

· drag the ‘month’ field from the top into the ‘Rows’ panel below

· drag the ‘diff’ field from the top into the ‘Values’ panel at the bottom-right

· change the Value Field Setting to calculate an average instead of a sum

IV. On the left you should now see the average difference for each of the 12 months

Look at the seasonal effect values you have just calculated.

· You should notice some are positive and some are negative. Why is this?

c) Residuals

Finally, we will calculate the residuals at each time t once the trend and seasonal effect have been removed.

First, populate column G with repeat copies of the seasonal cycle we compiled in the previous step. You may have to “hard copy” the values from the pivot table first.

The residuals are then the remainder after subtracting the trend and the periodic signal from the original value (column D minus column E and minus Column G). A plot of these residuals should look something like this.

Figure 2. Residuals from time series decomposition of Mauna Loa data.

These residuals are now de-trended and should be ‘stationary’ such that the probabilistic character of the series does not change over time (i.e. any section of the time series is “typical” for every other section; but note there are different ‘degrees’ of stationarity). Such stationary data can then be subjected to auto-regressive and/or moving average modelling which can be used for forecasting (e.g. see Von Storch and Zwiers 2001). These techniques are beyond the level of this practical but you may be interested in pursuing them further elsewhere.

2.2. Time series decomposition in R

We will now briefly introduce you to a powerful R function (‘DECOMPOSE’) which takes a time-series and splits it into the components like we did above.

Before you start the data analysis in R, ensure that you have:

• Created a folder on your computer for this week’s practical

• Set the working directory for the practical work in R Studio

• Loaded the required packages. We will be using the package timeSeries

• Read the csv file with the CO₂ data into R

Hint: Refer to Practical 2, section 4.1 for guidance on how to do these things.

We start by cleaning the data just as we did in Excel. First plot the data to visualise and notice the oddities prior to 1976. Remove the odd data from the dataframe.

plot(mlo$value, type = "l")

mlo <- mlo[mlo$year > 1976,]

Now we can create a timeseries object using the data we have loaded.

mlo.ts <- ts(mlo$value, frequency=12, start=c(1977,1))

A timeseries object has some build in functionalities that we can use. For example, we can use plot for a timeseries object and it will automatically select the type “line” and add labels to the x-axis.

plot(mlo.ts)

Now we can decompose the time-series into a decomposition dataframe. and plot all of the decomposed time-series at the same time:

DEC <- decompose(mlo.ts, ”additive”)

plot(DEC)

If you want to practice with R, you could try and do the decomposition yourself, like we did in Excel. The code is provided below, see if you understand what each line is doing.

yrs <- max(mlo$year) - min(mlo$year) + 1

weights <- c(.5, rep(1,11),.5)/12

trend <- filter(mlo.ts, filter=weights, sides=2)

plot(mlo.ts)

lines(trend, col="red")

trend.adj <- mlo.ts - trend

month <- factor(rep(1:12, yrs))

seasonality <- tapply(trend.adj, month, mean, na.rm=TRUE)

plot(seasonality, type="h", xlab="Month")

title("Seasonal Effects for Mauna Loa Data")

abline(h=0, col="grey")

residual <- mlo.ts - trend - rep(seasonality, yrs)

plot(residual, main="Estimated Residual Variability")

3. Autocorrelation

Another aspect of time series we will consider here is the characteristic of auto-correlation, or the correlation of the data with itself at different ‘lags’. A ‘lag’ in this sense is a difference between points in time (a time delay), so for example the difference between a value at time t and at time t+3 would be a lag of +3.

3.1. Autocorrelation in Excel

To calculate autocorrelation in Excel we can use the CORREL function. First, we duplicate our data with vertical off-sets:

I. Create a new blank worksheet in your Excel workbook and copy the original core data to this sheet

II. Insert a few empty rows above the dataset to give space for doing statistics later

III. Select the values in column D and ctrl+c (copy) this block

IV. In a next column, move to a cell one below the top and paste the data (ctrl+v)

V. Repeat this process so that you have a ‘cascading’ sequence of data columns, where the data in each column is off-set by one row from the previous one:

We can now calculate the auto-correlation between the time-series and its own copy with a lag of +1, by doing the following calculation in column F above the dataset. In the space above the column with your lag +1 off-set, calculate the autocorrelation as:

=CORREL($D6:$D1000, F6:F1000)

By using a data range in the formula extending to row 1000 we don’t have to worry about the bottom end of the dataset, and by using the $ sign in front of the column letter we keep the first array in the formula always to be the original time-series in column D. We can now copy this formula to the other lagged columns to the right.

Extend this system to lag +24.

After this you should have a horizontal row with 24 auto-correlations, from lag +1 to lag +24. Select these 24 values and plot them as a simple bar-chart:

· What do you notice about the correlogram? For what lags is autocorrelation high? For what lags is autocorrelation low?

In Part 2 above it was quite obvious that the seasonal effect would be for a 12-month cycle and so we used a moving window of size 12 to estimate our running mean.

· The correlogram you have just produced justifies the decision to use a moving window of 12 – why is this?

Note that the way we have calculated to autocorrelation in Excel, by copying and pasting columns of data and shifting them, is not only time-consuming but also not the most mathematically rigorous way of calculating autocorrelation. If we were to do this in R (see below) it would not only be quicker, but also yield more accurate results (for the reasons see: http://stats.stackexchange.com/questions/10947/formula-for-autocorrelation-in-r-vs-excel).

3.2. Autocorrelation in R

The same analysis that you just did in Excel, can also be done in R. Continuing from the script. that we write in section 2.2, we can now add the autocorrelation calculation. The following R code performs the same calculations that we just did in Excel. See if you understand what each line is doing. (Remember: typing ?functionname in the console will open the page with a description of the function.)

We can create plots of the lags in the timeseries:

lag.plot(mlo.ts, lags = 1, set.lags = c(3), do.lines=F, pch=20, main="Lag Plot, k=3")

lag.plot(mlo.ts, lags = 9, set.lags = seq(0,24,3), do.lines=F, pch=20, main="Lag Plot")

We can use the Auto- and Cross- Covariance and -Correlation Function Estimation to calculate the autocorrelation:

acf(mlo.ts, lag.max = 24, plot=T)

acf(mlo.ts, lag.max = 24, plot=F)

We can use the same function to calculate the autocorrelation for the separate timeseries of the trend, seasonality and residuals:

trend.notna <- na.omit(trend)

acf(trend.notna)

acf(seasonality)

residual.notna <- na.omit(residual)

acf(residual.notna)

4. More advanced time series analysis in R

The analysis above can be done much quicker in R than in Excel. Furthermore, many of the advanced methods highlighted in the lecture are either impossible or very difficult to implement in Excel. Good examples of time series analysis in R, including the advanced methods discussed in lecture, are provided in Dettling (2014). The timeSeries package has useful time series functionality.

References

Dettling, M. (2014) Applied Time Series Analysis. Zurich: Swiss Federal Institute of Technology. Available from: https://ethz.ch/content/dam/ethz/special-interest/math/statistics/sfs/Education/Advanced%20Studies%20in%20Applied%20Statistics/course-material-1921/Zeitreihen/ATSA_Script_v200504.pdf [Accessed 24 October 2024]

Keeling, R.F. (2008). Recording Earth's Vital Signs. Science, 319(5871), 1771-1772.

NOAA ESRL GMD (2016) Mauna Loa Carbon Dioxide Flask Monthly Average Data [Online] Available from: ftp://aftp.cmdl.noaa.gov/data/trace_gases/co2/flask/surface/co2_mlo_surface-flask_1_ccgg_month.txt [Accessed 13 November 2016]

Von Storch, H., & Zwiers, F.W. (2001). Statistical analysis in climate research. Cambridge: Cambridge University Press.

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030

联系我们

EMail: 99515681@qq.com

QQ: 99515681

留学生作业帮-留学生的知心伴侣！

工作时间：08:00-21:00

微信客服：codinghelp

热门主题

课程名