代做ECON235 Report Assignment PART 2帮做R编程

ECON235

Report Assignment PART 2 (50 marks)

Due date: 5 pm, October 18, 2024. Upload your answers and R script. in the Assessment 2 section on Moodle.

Instructions:  This is an individual assignment. Use the starter R script. form. Moodle to create the dataset for your assigned city (London, Paris or Rome). The answer the questions below and submit your answers along with the full R script. Include your name, student number and tutorial group.

CHECKLIST: your submission must include the following

(1)      Report Part 2 in PDF format (Word files will not be accepted)

(2) R script. with your code (a file with an .R extension which can be run in RStudio)

Submissions that do not satisfy these two conditions will be returned for amendment and will have the late penalty applied on the per day basis.

Questions

In this section of the report, you will model the relationship between hotel prices and their distance from the city centre for your selected city. Make sure that 'hotelbookingdata.csv' file is in your Working Directory and then   run the starter R script. to create a dataset for your assigned city.

2.1 Getting started

Note that the starter file uses the following code to create the dataset for subsequent analysis:

city_data <- filter(df, city == 'London', year == 2017 & month ==11 & weekend ==0) |>

separate(accommodationtype,'@', into = c('garbage','acc_type')) |> separate(distance, ' ', into = c('distance','miles')) |>

select (-garbage, -miles) |>

filter(acc_type == 'Hotel') |>

select(hotel_id, distance, price, neighbourhood, starrating)|>

mutate(distance = as.numeric(distance))|> distinct()

Explain what this code does line-by-line.

In what follows please include all graphs that you produce and discuss into your report.

2.2 (5 marks) Explore the Distribution of the Variables

Begin by creating histograms for both the distance (distance from the city centre) and price (hotel price) variables. Additionally, generate a scatterplot with distance on the x-axis and price on they-axis to visualize the relationship between these two variables. In your analysis, carefully examine the patterns visible in the data. Consider the following:

(i) How do the distributions of distance and price look? Are they normal, skewed, or uniform?

(ii) Are there any outliers or extreme values in either variable?

(iii) Is there any visible relationship between distance and price in the scatterplot? Does it appear linear, non-linear, or are there any clusters?

Based on your observations, discuss the patterns in detail. Describe any peculiarities in the distributions or the scatterplot and highlight any extreme values that may warrant further investigation.

2.3 (10 marks) Addressing Extreme Values

Next, focus on dealing with extreme values (outliers) in the distance and price variables. For any potential outliers you identify, carefully examine the values of other variables in the dataset that could provide context for these observations, such as:

The hotel’sstar rating

•   The city district or neighbourhood where the hotel is located

To better understand these extreme values, you may need to refer to external sources like Google Maps to verify the hotel’s location. This can help you understand if these extreme values are due to real-world factors   (e.g., hotels located far from the centre but offering luxury services) or data errors. After analysing these factors, decide how to handle the outliers.

Decide if you should:

(a) Remove the extreme values from your dataset?

(b) Transform. the variables or apply specific filters to exclude unrealistic data points?

Clearly state your criteria for handling the extreme values. Justify your approach (e.g., removing hotels with prices above a certain threshold, or restricting distances to within a reasonable range based on the city’s geography). Once you’ve finalized the dataset, recheck the histograms and scatterplot. Ensure that the patterns are now consistent with your expectations and that outliers have been appropriately addressed.

2.4 (6 marks) Linear regression

(a) Non-Parametric Lowess Fit

Begin by fitting a non-parametric Lowess (Locally Weighted Scatterplot Smoothing) curve to model the relationship between price and distance. This method allows you to visually explore the trend without assuming a specific functional form. Plot the Lowess curve on top of the scatterplot from Section 2.2.

After generating the Lowess curve, discuss the observed relationship. Is the trend linear, or does it show non-linear patterns such as curves or plateaus? Does the price drop or increase at certain distances, or is there a consistent trend? Based on the shape of the curve, evaluate whether a linear model seems appropriate for your data.

(b) Fitting a Linear Regression Model

Now, fit a linear regression model to the data to explore the relationship between hotel price (dependent variable) and distance (independent  variable) from the city centre. The equation will be of the form.

price =  α  +  β * Distance +  ε

After estimating the model, reproduce the estimated regression line on the scatterplot to compare it with the Lowess curve. This will help you visualize  the differences between the non-parametric fit and the linear model.

Present the regression results in a table format. The table should include: the intercept and slope coefficients, standard errors, t-values, the R- squared value. Interpret the coefficients and comment on their significance.

2.5 (8 marks) Log transformations

(a) Creating Log-Transformed Variables Start by creating two new variables:

Log-Price: The natural logarithm of the price variable.

Log-Distance: The natural logarithm of the distance variable.

Produce histograms for both the log-price and log-distance. Comment on their distributions. Are they more normally distributed than the original variables? Are there any extreme values or outliers? Compare the shape of the log-transformed distributions with the original histograms from Section 2.2. Are the transformations useful in making the variables more evenly distributed?

(b) Scatterplots and model choice

Next, create three scatterplots to explore the relationship between price and distance under different specifications:

1. Log-Log Scatterplot: Plot log-price on they-axis and log-distance on the x-axis.

2. Log-Level Scatterplot: Plot log-price on they-axis and distance on the x- axis.

3. Level-Log Scatterplot: Plot price on they-axis and log-distance on the x-axis.

Evaluate which specification makes the most sense for your dataset. Is the relationship more linear in any particular specification? Does one model explain the variation in the data better? Are there any clear outliers or patterns in one scatterplot versus the others?

Based on the analysis, choose your preferred specification. Justify your  decision based on the patterns in the data and the interpretability of the model. For example, if the log-log specification gives a clearer linear relationship, you may prefer that. Alternatively, if distance affects prices in absolute terms, a level-log or log-level specification might be more suitable.

(c) Now, fit a linear regression model based on the specification you selected (log-log, log-level, or level-log). The regression equation will depend on your chosen model. After estimating the model reproduce the estimated regression line on the scatterplot (just like instep 2.4). Present the regression results in a table format, discuss their statistical significance and interpret them. Note that interpretation of the regression coefficients will depend on the model that you have chosen.

2.6 (6 marks) Linear piecewise spline

Now fit the linear piecewise spline model in levels (no logs) with one knot (cutoff point).

To choose the knot first, examine the Lowess fit from Section 2.4(a) to visually identify where the relationship between price and distance seems to change. This point of change will serve as the knot (cutoff) in the spline regression model. For example, you might notice that after a certain distance (say 3 miles from the city centre), the relationship between price and distance becomes less steep or even reverses. This distance would be your knot.

Fit a piecewise linear spline model that divides the relationship into two parts—before and after the knot—allowing for different slopes on each side of the cutoff point. Present the regression results in a table format, discuss  their statistical significance and interpret model coefficients.

2.7 (15 marks) Choosing the best model

Now you’ve estimated three different models:

1. Linear regression: This model assumed a linear relationship between price and distance.

2. Your chosen model with log transformations: This could be either a log- log, log-level, or level-log model, which transformed the variables to better fit the relationship.

3. Piecewise linear spline model: This model allowed for different slopes before and after a chosen cutoff (knot), providing flexibility in the price-  distance relationship.

Which of these models explains the data better? Justify and explain in detail your choice!

Once you have chosen the best model. Use it to conduct residual analysis to find the best priced hotels in your city.  Save the residuals and the fitted as new variables in your dataset. Use residual analysis to identify the best and worst hotel deals. Include a table of best and worst hotel deals (identified by their ids) in your report.

Produce the scatterplot of data showing the estimated regression line for the best model and five best deals (in green) and five worst deals (in red). All other observations must be of the same uniform. colour.

Finish this part of the report by discussing the results of your analysis. What are its limitations? Which steps would you take to produce a better model   of the relationship between hotel price and distance.





热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图