代写INFS5720 Business Analytics Methods Term 1, 2025代写数据结构语言

INFS5720

Business Analytics Methods

Individual Assignment

Term 1, 2025

This assignment covers Lecture 1 to 3. It accounts for 15% of the final grade for Business Analytics Methods. The deadline is 21 March 2025, 15:00:00. Do not wait till last minute. Late submissions (even by a few seconds) will still be marked as late submission by Moodle. The teaching team strictly follows the flagging mechanism of Moodle. UNSW has a standard late submission penalty of:

5% of the full marks per day

capped at five days (120 hours) from the assessment deadline, after which a student cannot submit an assessment

no permitted variation

You are to submit a WORD document (not PDF) to Moodle, Left menu > Assessments Hub > Individual Assignment > Individual Assignment Submission. Turnitin is turned on to check similarity score among all submissions. To avoid a high Turnitin score, do NOT copy the assignment questions into the report. The similarity score is not generated upon submission. This is to avoid students relying on Turnitin score and tune the similarity score by repeated resubmission. If the work is done independently, the similarity score should not be an issue.

Every page’s header should contain Your zID, similar to this Individual Assignment guideline file. Do NOT write your name. A cover page is optional.

Please use "Your zID" for Submission Title when you upload. The file name should also be “Your zID.docx”. Submissions that do not adhere to this will be penalized.

Details of report format:

Length: should not exceed 4 pages, including the relevant graphs, tables,

references, screenshots, and appendices (if any), but excluding the cover page (a cover page is optional). This limit is deliberately set as 4 pages, to ensure that AI’s lengthy answers are summarized succinctly and to the point.

Font Style. Times New Roman for writing; Courier New for code (if any)

Font size: 12 for writing; 10 for code (if any)

Line spacing: 1

Margins: 1 inch or 2.5cm for the top, bottom, right and left

Include the page number on each page

Up to 25% of full marks as penalties will be imposed for inappropriate or poor paraphrasing. Serious cases will be investigated. More information on effective paraphrasing strategies can be found on

https://www.student.unsw.edu.au/paraphrasing-summarising-and-quoting.

Your writing should be succinct but not at the expense of excluding relevant details.

Use plain and simple language. Some questions may not come with absolutely right or wrong answers, and you have the liberty to express your views about the problem.

However, your points must be supported by evidence and sound reasoning. It is the quality and not the length that counts. Make sure you follow the report guidelines and style. specified in this assignment.

Please follow APA style. of referencing. More details can be found at

https://www.student.unsw.edu.au/apa. Where students use ChatGPT or any Generative AI tool in their work, this must be appropriately cited according to discipline norms, e.g., right below the written paragraph that used Generative AI, or included in appendix. How to reference Generative AI within APA can be found at

https://apastyle.apa.org/blog/how-to-cite-chatgpt

Any student may be called upon to provide a viva voce (from the Latin meaning ‘living voice’) for any assignment. A viva voce is an interview style. meeting where you will be asked to explain, discuss, or use information related to any assignment or work produced for this course. These can be used to ascertain knowledge and ability  including the extent to which the student has undertaken the required reading, done preparatory work and can demonstrate understanding of what they have written or presented. Viva voces are used in conjunction with submitted assessment work not instead of submitted work. (Used with permission created by Assoc Prof. Lynn Gribble, UNSW Sydney.)

The answers should be presented in order according to the sequence of the questions listed in the assignment; that is, in the order of Q1 a), Q1 b), Q2 a), etc. You can have  several sub-sections within a section if you deem appropriate. The report must be self- contained. It is essential to include all relevant tables and figures as evidence to support your answers.

Summary:

• Write in plain English clearly and succinctly

• Write appropriately to the context (AI’s answer is usually too generic)

• Provide a reference at the end of the report

• Good overall presentation of the report

Overview

“Individual Assignment.ipynb” is to guide students with standard operations on data set, and, in some cases, provide model implementation that is almost complete, so that students can focus on interpreting the results. Do NOT submit the .ipynb file.

The total marks of this assignment are 60 marks.

As an Analyst in the Analytics team of a women's hospital, your role is to analyse patient data from the diabetes diagnosis process. Your goal is to uncover patterns, assess risk factors, and provide insights that can help improve early detection, patient care, and treatment strategies for diabetes within the female patient population.

The dataset is in ‘Diabetes_Diagnosis.csv’. The description of the table is in ‘Diabetes_Diagnosis_Description.xlsx’ .

Before you run any code ofa sub question, please read the description and the instructions for that sub question in the code file very carefully, to understand the purpose of the code and how to run the code correctly.

Question 1

We will use K-means to study the hidden patterns in this dataset. Pre-processing step uses normalisation, with MinMaxScaler, with predetermined min and max, to reduce the range of all columns to [0,1]. This is important for all variables to have equal impact on the clustering results.

(a) There are two options to run K-means clustering algorithm. Option 1 is to use all columns. Option 2 is to exclude ‘Outcome’ column. The given code produces each variable’s distribution in each cluster and specifically compares each scaled and original column's mean and median values across all clusters. Discuss which option produces more useful clustering results and why. (10 marks)

(b) In the given code, we run K-means with k ranging from 2 to 15 and plot the elbow line with respect to Sum of Squared Distances. A plot regarding the Average Silhouette Score is also provided for your reference. Pick the best k in your opinion and state your reason why this k value is the best. (10 marks)

(c) Rerun K-Means with the best k value in your opinion. Run the given code to see the data distribution of all columns in each cluster. Based on the variables that are significantly different across different clusters, study the unique characteristics of each cluster, and give an intuitive name to each cluster, so that you can quickly convey the cluster results to the medical team. For each cluster, make suggestions to various medical teams how they shall handle each cluster differently in the next steps e.g. follow up consultations, health checkup reminders etc. (10 marks)

Question 2

Your next task is to predict whether the patient has diabetes, by building a Logistic Regression Model. You are predicting the ‘Outcome’ column, using all other columns as input variables.

(a) Run the given code of Logistic Regression. Discuss the P-values and coefficients generated for two variables: ‘SkinThickness ’ and ‘Pregnancies ’. Explain in plain English the impact of these two input variables on the target variable Outcome.    (10 marks)

(b) We define target variable utcome=1 as the positive class, i.e., the patient has diabetes. Explain in plain English what False Negative (FN) case and False Positive (FP) case are. Discuss which one, FN or FP, is worse and whether the predictive model of your hospital should be optimized for Precision or Recall.   (10 marks)

(c) The model above uses a default threshold of 0.5 for diagnosing diabetes. Run the given code to try threshold from 0.1 to 0.9. As the threshold goes up from 0.1 up to 0.9, what do you observe about Precision and Recall? Based on the hospital’s goal of minimising misdiagnosed cases while ensuring timely intervention, suggest the best threshold and justify your choice. (10 marks)


热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图