代做QBUS6840 Group Assignment代写Python编程-留学生作业帮

代做QBUS6840 Group Assignment代写Python编程

QBUS6840 Group Assignment

Key information

1. Required submissions:

a. ONE written report (word or pdf format, through Canvas- Assignments- Report submission (group assignment)).

b. ONE code file (Jupyter Notebook “ .ipynb” or Python “.py”, through Canvas- Assignments- Code submission (group assignment)).

2. For the submission, each group should pick up a group representative who needs to

submit both files. Each group should only submit one report and one code file.

3. Due date/time: Thursday 23 May 2024, 23:59 pm (Report and Code submission).

4. The late penalty for the assignment is 5% of the maximum mark per day. The closing date Sunday 2 June 2024, 23:59 pm is the last date on which an assignment will be accepted for marking.

5. Weight: 25% of the total mark of the unit.

6. The full marks of this group assignment are 65 marks, excluding the bonus marks.

a. The maximum bonus marks based on the class forecasting competition are 3 marks.

b. The maximum bonus marks for using Transformer for the forecasting task are 3 marks.

c. Therefore, even if you receive a zero-bonus mark, your maximum possible mark % for this assignment is still 100%. This makes the group assignment fair to all groups, then groups with good forecasting results or have successfully used Transformer can receive additional bonus marks as recognition of your quality work.

7. Groups: you should complete this group project in a group of four students. You must follow the allocated groups on Canvas-People-QBUS6840 group page.

8. Presentation: please refer to the Presentation Instructions section of this file for more detailed instructions, including the length requirement of the report, font size, etc. To facilitate your report writing process, a Report_Instructions.pdf file is also provided on Canvas.

9. Numbers with decimals should be reported to the four-decimal point.

10. Marking Criteria: please refer to the Marking Criteria section of this file for more detailed instructions.

11. Please include the name and student ID of all group members and group ID in the submitted report and code file. You do NOT need to include the cover page and table of content. The names of your report and code should follow the following formats respectively, by replacing "123" with your group ID. Example: Group_123_Report, Group_123_Code.

Key rules

. Carefully read requirements of the assignment.

. Please follow any further instructions announced on Canvas.

. You must use Python for the assignment.

. If the training of your model involves generating random numbers, your Python code random seed must be fixed, by using np.random.seed(0).

. Reproducibility is fundamental in data analysis, so that you will be required to submit a code file that generates your results. Not submitting your code will lead to a loss of 50% of the assignment marks.

. Failure to read information and follow instructions may lead to a loss of marks. Furthermore, note that it is your responsibility to be informed of the University of Sydney and Business School rules and guidelines, and follow them.

. Referencing: Harvard Referencing System. (You may find the details at: http://libguides.library.usyd.edu.au/c.php?g=508212&p=3476130).

Background

The UnderEmployment rate is the number of underemployed people expressed as a proportion of the labour force. The underemployment refers to the condition in which people in a labor force are employed at less than full-time or regular jobs or at jobs inadequate with respect to their training or economic needs. The underemployment rate is reported by the relevant government department in most countries. The underemployment rate can be used as an important indicator by the central bank of the country to determine the health of the economy when setting monetary policy.

Tasks and Datasets

For this group project, we have obtained the monthly historical underemployment rate data in a country (name omitted on purpose) from February 1978 to December 2017, as in dataset UnderemploymentRate_InSample.csv, which can be downloaded from the Canvas. The dataset contains information of Date (1/month/year, so monthly data) and underemployment Rate.

Your task is to develop a predictive model, trained with UnderemploymentRate_InSample.csv, to forecast the monthly underemployment rate from January 2018 to December 2019. Note this is a 24-step-ahead forecasting task.

An out-of-sample test dataset which contains the true 2018 and 2019 underemployment rates, named UnderemploymentRate_OutofSample.csv in the same format as the in-sample data, is provided on Canvas. They will be used to assess the forecast accuracy of your produced models. Since you should assume the out-of-sample data is completely hidden from your model training/selection process, you must NOT use the out-of-sample test dataset in your model training/selection process. Otherwise, your model training process will be treated as having critical issues and you will receive significant mark deduction on the methodology and forecasting results, no matter how good your forecasting results are.

In other words, the out-of-sample test dataset should be only used to evaluate your forecast accuracy (details to be shown later).

Please note the assignment tasks are designed to be open-ended questions. This gives more freedom for you to explore a good solution and is similar to the situations that you might encounter in the real world.

You need to prepare a report for this assignment. The purpose of the report is to describe, explain, and justify your solutions with polished presentation. Be concise and objective. Find ways to say more with less.

You MUST submit your Python code which can be used to replicate the results in your report. Please note even if you fix your Python code random seed by using np.random.seed(0), changing the computer/CPU could have impact on random number generation and produce slightly different results. Please note the key target of having replicable results is to make sure that every group has genuine results reported. Therefore, if you have slightly different results for different runs/computers, it is fine. As long as the marker can re-run your code and have results that are close to yours, then it is fine.

Suggested Report Outline:

1. (2 marks) At the beginning (the first line) of your report, you should report your best out-of-sample forecasting result, by stating: “The best out-of-sample forecasting Root Mean Squared Error of our group is: ……”. Please note the markers will run your code and check whether your reported results can be produced/replicated.

Reporting false results deliberately can result in an up to 30% mark duction of the assignment marks.

2. (5 marks) Introduction. Write a few paragraphs stating the business problem and summarising your works, etc. Use plain English and avoid technical language as much as possible in this section (it should be for the general audience).

3. (10 marks) Data pre-processing and exploratory data analysis (EDA). Write python program to clean the data, e.g., checking/deleting incomplete information if any, making sure data is complete, or transforming the data if needed, etc. It is up to you on whether/how to transform. the data so that the resulting dataset can be well incorporated in training your chosen models.

Conduct initial analysis of the time series by plotting them or do what you can to reveal any patterns. Summarise what you have revealed or observed. In your report, carefully present your EDA procedure and findings, and discuss how the EDA results inform. you on the methodology section.

4. (40 marks) Methodology and forecasting results. In your report, you should present the details of your three best ranked models. The three models should be different types of models. For example, ARIMA(1,1,0) and ARIMA(2,0,1) are the same type of models. ARIMA and Seasonal ARIMA models will be counted as different types of models. Simple Exponential Smoothing, Trend Corrected Exponential Smoothing, Additive Seasonal Holt-Winters Smoothing and Multiplicative Seasonal Holt-Winters Smoothing models will be counted as different types of models. Neural Networks Autoregression and Recurrent Neural Networks (RNN) models will be counted as different types of models.

The details of the methodology/model should include: your rationale, how you train your models, model selection process, some interpretations, your findings and justifications of your choices. You can try models that are not covered in our unit. However, for the three models presented, at least two models should be the models that we have covered in the lecture. The types of models could be the Moving Average, Decomposition method, Exponential Smoothing, ARIMA, Neural Networks Autoregression Model, RNN, Forecasting Simple Average, Forecasting Combination, etc. This choice is yours.

In particular, if you choose to use Transformer model and it ranks as one of your top three performing models and you present the details of your transformer model implementation, then you will be potentially eligible for a maximum of 3 bonus marks. The validity of your modelling process of transformer will be still checked by the marker. Transformer architecture is a type of deep learning model introduced in the paper "Attention is All You Need" by Vaswani et al. in 2017 (paper attached on Canvas). ChatGPT refers to a Generative Pre-Trained model which is built on the transformer architecture. Transformer can be also used for the time series forecasting tasks. You may search online resources on using transformer for time series forecasting and see how to organize the given time series and fit it into the transformer architecture to generate the required forecasts. Please note that using transformer is OPTIONAL for the assignment.

Below list contains some further clarifications on the methodology/modelling.

. As mentioned above "In your report, you should present the details of your three best ranked models.", while in the assignment working process in general you should try more than three models. This is because you need to provide rational and justifications on your choice, i.e., why do you initially choose to test these 5 or 10 models (rational)? Why do you finally decide to present these 3 models (justifications of your choices)? If I were to work on the assignment, I would try 5 or 10 or even more models and use the model selection technique (train/validation split) and/or out-of-sample forecasting results to decide my final three models.

. Then in your report, you can present the details of the final three models and explain your whole assignment working process. Potentially you could also briefly include the working process and test RMSE values of other models that you have tried. By following this strategy, you provide strong rational and justifications on your final choice.

. Rational here means why this model is initially used/chosen. For example, suppose you have discovered some seasonality in the data with the EDA, then the rational here means you wanted to try some models that can consider seasonality, i.e., rational means you have a decision in accordance with reason or logic. Then in the rational part, you could mention some theoretical definition with mathematical formula of this seasonal model, i.e., how seasonality is modelled in the framework. You could also provide reason/logic on why you think this model could be a good candidate. Later, with the model training/selection and evaluation process, you can have further justifications on your choice.

. If your selected model does not require a model selection process, clear justifications on why this model is selected should be well documented. For example, based on the EDA, you can argue that additive HW exponential smoothing model is suitable for modelling the given time series. Since additive HW exponential smoothing model has fixed model complexity, then you do not need to have the model selection process with train and validation split. However, if you choose additive HW exponential smoothing model and decided to do a train and validation split to evaluate its forecasting performance before the final out-of-sample testing, this is also fine.

. If the selected model requires a model selection process, such as ARIMA or NN models or your other selected models, a formal model selection process must be implemented and well documented.

o For example, if your selected model is ARIMA or NN models, then you must have a model selection process with train and validation split.

o This means you need to select one ARIMA model from many potential ARIMA models with different lags (including seasonal ones), via using the optimal validation data performance (or criteria such as AIC/BIC). Do the same to select one NN model with different number of hidden layers and hidden neurons, and so on so forth.

o With the selected ARIMA model specification/complexity etc, re- train the selected model with the whole in-sample data and report its final out-of-sample forecasting RMSE.

o Always remember: "Since you should assume the out-of-sample data is completely hidden from your model training/selection process, you must NOT use the out-of-sample test dataset in your model training/selection process.

Then you report the out-of-sample forecasting Root Mean Squared Error (RMSE) results of your three presented models. In particular, your best model’s out-of- sample RMSE forecasting result should be presented at the beginning of the report, as mentioned in the above point (1). This best model’s result will decide the forecast competition bonus marks for your group, refer to the Marking Criteria section later for more details.

Calculation of the out-of-sample forecasting results. You need to use your trained models to predict the 24 underemployment rates of 2018 and 2019. Please note that this is a 24–step-ahead forecasting task, since we assume you are in December 2017 (time stamp T) and have no knowledge about 2018 and 2019. Therefore, as mentioned you should assume the out-of-sample data is completely hidden from your model training/selection process, and you must NOT use the out-of-sample test dataset in your model training/selection process.

The 24 predicted values of the underemployment rate should be used to calculate the out-of-sample forecasting error. More specially, you need to use the Root Mean Squared Error (RMSE) to evaluate the forecast accuracy. The RMSE, computed on the out-of-sample data, is defined as follows. Let YT+ℎ|1:T be the ℎ -step-ahead point forecast, based on the in-sample data Y1:T = {Y1, Y2, Y3, … , YT − 1, YT }. The true ℎ -th underemployment rate value YT+ℎ is included in the out-of-sample data UnderemploymentRate_OutofSample.csv. The out-of-sample RMSE is computed as follows:

here 24 is the number of observations in the out-of-sample data.

5. (3 marks) Final analysis, conclusion, limitations and future steps (non-technical).

6. Appendix. In the appendix section, you MUST include three meeting minutes using the provided Minutes Template on Canvas. More detailed instructions are also given below. You can also put any other materials that you see appropriate into the Appendix section. The Appendix will NOT be counted into the length of the main report and there is no page limit for the Appendix.

Meeting Minutes

. Your group is required to submit three meeting minutes which are to be attached to the report as the Appendix. Your group should use the Minutes Template on Canvas for preparing agendas and meetings minutes.

. Each minute should at least record the following information:

o Meeting dates/time/duration;

o Key points of the process of discussion, such as who said/did what;

o Action list, responsible member(s), task due time, etc. It is crucial that you clearly document the actions and works for each member during each meeting;

o Review/group judgement on the quality of individually completed/responsible tasks. The purpose of this is to infer whether a member is doing his/her share of work;

o The minute template contains some example input.

In case of a problem raised within a group, we will request minutes of all group meetings. We will make an individual adjustment to the group mark, if there is sufficient evidence shows that a student has done significantly less works than other members. If a student has truly done very little work, a mark of 0 will be awarded for the student.

Marking Criteria

The full marks of this group project are 65 marks, including 60 marks for the report and 5 marks for the presentation. In addition, the maximum bonus marks based on the class forecast competition are 3 marks. The maximum bonus marks for successfully using transformer (rank as one of the top 3 performing models) for the forecasting task are 3 marks. More details are shown below:

. The content in your report Group_123_Report is worth 60 marks (with suggested report structure and mark break down as above in the Suggest Report Outline section):

o Focus on the appropriateness of the chosen forecasting methods and provide full explanation and interpretation of any results you obtain in your report. Output without explanation will receive 0 marks.

o Describe your data analysis procedure in detail: how the data pre-processing is completed, how the EDA is done, what and why these models are used, how the models are trained, the model selection process, some interpretations, your findings and justifications of your choices. The descriptions should be detailed enough so that other data scientists, who are supposed to have background in your field, understand and are able to implement your work.

o Clearly and appropriately present any relevant graphs and tables.

o You may insert small section of your code into the report for better interpretation when necessary.

. The Python implementation. The main program file should be named as Group_ 123_code.ipynb (or Group_ 123_code.py). Your program must be runnable and your out-of-sample forecasting RMSE results must be replicable. Reporting false results deliberately can result in an up to 30% mark deduction of the assignment marks.

The idea is that, when the marker runs your Group_ 123_Code.ipynb (or Group_ 123_Code.py), with the in-sample train data UnderemploymentRate_InSample.csv and out-of-sample test data UnderemploymentRate_OutofSample.csv in the same folder as the Python file, the marker expects to see the same (or at least close) out-of-sample RMSE value as you reported. The code file should contain sufficient explanations so that the marker knows how to run your code.

. Presentation is part of the assessment. The marker will assign 5 marks for presentation. The detailed instructions are shown in the following Presentation Instructions section.

. We will allocate a maximum of 3 bonus marks (for each student in the group) for the forecast competition among the groups. Groups will receive marks according to the rank of your best out-of-sample forecast RMSE value (the value that you reported at the beginning (the first line) of your report; the smaller the better), according to the following rules:

o If the out-of-sample forecast RMSE of your forecast is within top 5 percent in the class, then the full 3 bonus marks (for each student in the group) will be awarded;

o If the out-of-sample forecast RMSE of your forecast is between 5.1 percent (using one decimal rounding) and 20 percent in the class, then 2 bonus marks (for each student in the group) will be awarded;

o If the out-of-sample forecast RMSE of your forecast is between 20.1 percent (using one decimal rounding) and 40 percent in the class, then 1 bonus mark (for each student in the group) will be awarded;

o Otherwise, 0 bonus marks will be awarded.

. We will allocate a maximum of 3 bonus marks (for each student in the group) for groups that successfully use transformer (rank as one of the top 3 performing models) for the forecasting task. The markers will check your implementation detail and decide the marks to be awarded.

Presentation Instructions

. Your report should be provided as a word or pdf document.

. Each group should submit one report and one code file by the group representative.

. To facilitate your report writing process, a Report_Instructions.pdf file is provided.

. The report should be NOT more than 15 pages (excluding Appendix and

Reference list), with font size not smaller than 11pt. The page limit applies to all the content in your report, such as text, figures, tables, small sections of inserted codes, etc, but excluding the Appendix and Reference list. A violation of this rule will incur mark deduction on the presentation marks.

. You do NOT need to include the cover page and table of content.

. Numbers with decimals should be reported to the four-decimal point.

. You report should:

o Include sections as suggested in Suggested Report Outline section.

o Include all the methodology details and steps as mentioned above.

o Demonstrate an understanding of the relevant principles of predictive analytics approaches used.

o Clearly and appropriately present any relevant figures and tables.

. Your group is required to submit three meetings minutes. Your group should use the Minutes Template provided on Canvas to prepare agendas and meetings minutes. Not providing the meeting minutes will incur mark deduction on the presentation marks.

. Later, the unit coordinator will collect peer feedback on the performance of each group member. Therefore, it is crucial that each group member is contributing genuinely to the group assignment.

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030

联系我们

EMail: 99515681@qq.com

QQ: 99515681

留学生作业帮-留学生的知心伴侣！

工作时间：08:00-21:00

微信客服：codinghelp

热门主题

课程名