代写COMM1190: DATA, INSIGHTS, AND DECISIONS FINAL EXAMINATION TERM 3 2023代做留学生SQL语言

TERM 3 2023

COMM1190: DATA, INSIGHTS, AND DECISIONS

FINAL EXAMINATION

QUESTION 1 30 MARKS

PART A – Data Communication 20 MARKS

You are working as a junior data analyst at CoinDesk, a financial service provider offering  its  customers  data  visualisation  tools  to  monitor  and  compare  crypto currencies. Your manager asked you to write a business report comparing two of the major  cryptocurrencies,   BTC  and   ETH,  unveil  insights,  and  suggest  actionable recommendations to stakeholders on how to effectively invest in cryptocurrencies. You generated the chart in Figure 1 to include in your report:

Figure 1. Comparison of BTC and ETH prices over time

You have received feedback from your manager mentioning that there are some issues in your visualisation that might mislead the stakeholders in their decision to invest. According to your manager, the visualisation shows BTC and ETH prices are evolving similarly in the last couple of weeks, which could mislead stakeholders to think that both cryptocurrencies generate a similar level of return. You decide to go back to the drawing board and figure out how to improve your visualisation.

Required:

Regarding the above case, please answer all the following questions:

a)     Critically evaluate the visualisation in Figure 1 and identify two aspects you could improve to better address your manager’s feedback above. [max 200 words] (5 marks)

Your manager asked you to suggest more effective visualisations with a focus on (i) the evolution of prices, (ii) the Market Cap of each cryptocurrency, and (iii) whether the spikes/drops in prices are associated with positive or negative news (based on social media sentiment). You decide to generate three charts to address your manager’s requests.

b)     Using the four chart types of frameworks (conceptual vs data-driven / declarative vs exploratory), identify the type of charts that you would use to address the manager’s request and justify your answers. [max 200 words] (6 marks)

c)     Sketch three charts to address your manager’s requests. For each chart, provide a brief explanation of your design choices and explain how the chart can unveil interesting insights that could help stakeholders in their decisions. To sketch the chart, you can use any tool you want (e.g., you can use a software tool like Infogram, Excel, or R). Alternatively, you can sketch the chart using pencils, pens, or markers on paper, then take a picture of the charts and paste them into your solutions document. You can use illustrative data. [max 300 words] (9 marks)

PART B – Data Ethics 10 MARKS

While Artificial Intelligence (AI) has remarkably contributed to the success of Amazon's e-commerce (e.g. AI recommendation systems, AI for product sales & and demand forecasting, AI for product delivery optimization, etc.), Amazon’s AI  hiring system adopted to recruit talented candidates for tech jobs, has encountered unpreceded criticisms and backlash. The story begins in 2015 when a team of machine-learning (ML) specialists working for Amazon uncovered a bias in their new recruiting engine and realised that Amazon's system was not rating candidates for software developer jobs and other technical posts in a gender-neutral way and taught itself that male candidates were preferable. This is because Amazon’s ML models were trained to vet applicants by observing patterns in candidates’ resumes submitted to the company over 10 years. Most came from men, a reflection of male dominance across the tech industry. According to experts in ML, the technology penalized resumes that included the word “women's,” as in “women's chess club captain” , and downgraded graduates of two all-women's colleges. Instead, it favoured candidates who described themselves using verbs more commonly found on male engineers’ resumes, such as “executed” and captured” .

This  has gone viral  pointing out that Amazon’s AI-hiring algorithm  is designed to secretly  promote  discrimination  as  it  shows  bias  against  women.  This  was  not highlighted as the only issue because gender bias in Amazon’s AI hiring system might create  a  problem  with  the  data  that  underpinned  the  ML  models’  judgments  to recommend  unqualified candidates for jobs. With the technology returning  results almost at random, Amazon claimed they shut down the project. Indeed, employers have long dreamed of harnessing technology to analyse resumes and job applications to optimise the recruitment process and reduce reliance on subjective opinions of human  recruiters.  But  Nihar  Shah  a  computer  scientist  and  researcher  in  ML  at Carnegie Mellon University, claims that there is still much work to do to leverage technology in recruitment and figure out “how to ensure that the algorithm is fair, how to make sure the algorithm is interpretable and explainable - that’s still quite far off”.

Extracted/adapted from “Amazon scraps secret AI recruiting tool that showed bias against women” (Jeffry Dastin)

https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-

idUSKCN1MK08G?fbclid=IwAR0KHgyudbhIotZTGQJtuvTx28_UevOVT1dRmtf_vsYh_AwYRj0ufpIpQxo

Required:

Regarding the above case, please answer all of the following questions:

a)     Discuss TWO (2) ethical issues in Amazon's AI-hiring algorithm. [max 200 words] (5 marks)

b)     Propose  TWO  (2)  recommendations on how to address these ethical issues. Justify your arguments with examples relevant to the use of AI in the recruitment process. [max 200 words] (5 marks)

QUESTION 2 40 MARKS

PART A - Linear Regression Model 21 MARKS

Suppose you are given a dataset with information about a class of 30 students, where for  each  we  can  observe  the  marks  (from  1  to  10)  in  Mathematics,  Information Technology, Physics, Literature, and Arts and the average number of hours each spends on studying at home. You want to investigate the relationship between the mark in Mathematics and all other variables in the dataset using a linear regression model. The following output is obtained as shown in Figure 2.

Figure 2. Regression output

Required:

Regarding the above case, please answer all of the following questions:

a)     Write  down the regression equation for the relationship between the grade in Maths and all other variables. Please provide an interpretation of the regression coefficients, including the intercept. [max 200 words] (5 marks)

b)     Use the output from the  regression to discuss whether the  marks in Arts and Literature are significant predictors of the mark in Maths. Enrich the discussion by  describing  the  hypothesis  testing  procedure,  and  the  decision-making process. [max 200 words] (5 marks)

c)     Consider the following scatterplot, as shown in Figure 3:

Figure 3. Relationship between the Maths and Literature marks

How does it align with the regression output in Figure 2? Provide a justification to support your answer, and comment on the overall goodness of fit by discussing at least two relevant statistics from the output. [max 200 words] (5 marks)

d)     What is the difference between the r2 and the adjusted r2  ? How can you explain the difference between these two values in this linear regression? How the model can be improved to yield better predictions of future marks in Maths? [max 100 words] (2 marks)

e)     Based on the QQ-plot shown in Figure 4, discuss the Normality assumption of the Maths grades. [max 200 words] (4 marks)

Figure 4. Normal Q-Q Plot

PART B - Logistic Regression Model 11 MARKS

Suppose you are a business analyst in an insurance company, and you want to assess the probability of claiming by a car policyholder. For each one, we can observe:

.    Car age (car_age);

.    Car value (car_value);

.    Age of the policyholder (driver_age);

.    Gender (gender);

.    Type of area where the car is mostly used (driving_prevalence, which takes three values: “Urban” (1), “Rural” (2), “Highway” (3))

.    Average speed (avg_speed)

The following output from the regression is obtained, as shown in Figure 5:


Figure 5. Regression output

Required:

Regarding the above case, please answer all of the following questions:

a)     Please comment on the statistical significance of the average speed [max  100 words] (3 marks)

b)     Given  the  confusion  matrix  shown  in  Table  1  based on  1000  new  policies, compute the accuracy rate (overall, for claiming policies and for non-claiming policies), and comment on the results. [max 100 words] (3 marks)

Table 1. Confusion Matrix

c)     The  boxplots  shown  in  Figure  6  indicate  the  fitted  probability  of  claim  by  a policyholder by driving prevalence and the average speed by driving prevalence. Please comment on these results and their relationship with the results from the logistic regression. [max 200 words] (5 marks)


Figure 6. Box plots

PART C - Data Analysis and Summary Statistics 8 MARKS

a)     Suppose  you would  like to analyse the number of deaths by cause for each calendar year between 2000 and 2023 in Australia. Which graphical tool would you use to obtain a visual preliminary insight? [max 100 words] (3 marks)

The  histogram shown  in  Figure  7  indicates  the distribution of the  life  expectancy beyond age 70. The summary statistics of the data are as follows:

.    Mean=75.03

.    Median=74.35

.    Skewness=1.22

.    Kurtosis=1.94

Figure 7. Life expectancy beyond age 70

b)     Please comment on the results from the summary statistics, and how these data can be processed to ease statistical analysis and modelling. [max 200 words] (5 marks)

QUESTION 3 - Research Design and Experiments 30 MARKS

Genetically modified organism (GMO) labelling is mandatory in Australian grocery stores but there is no legal requirement to specify it in restaurants. A local fast-food company is performing an experiment to implement genetically modified organism (GMO) labelling on some of their GMO ingredients for some products in eight stores in Eastern suburbs of Sydney over the two quarters that ended 30 June 2023. A data scientist, Jack, was  hired as a  research  consultant to design and  implement this experiment. The Eastern suburb’s stores where the GMO labelling has been trailed for the products, are classified as the treatment group whilst stores in other suburbs are considered the control group. The stores’ quarterly sales data of control and treatment groups are captured before and after the intervention. Model 1 compares the treatment group and control after the intervention, whilst model 2 relativises the treatment group and control groups before and after the intervention. Table 2 describes the data before the intervention.

Table 2. Summary Stores’ Statistics in Eastern (Treatment) and Other Suburbs (Control)

Jack analysed the experimental data by running the following two models, as

displayed in the regressions below, and generated the results in Table 3. He used 52 observations, 2 quarters each for 26 stores.

Model 1: yi  = βo  + β1 TTeatedi  + Ei

Model 2: yi  = βo  + β1 TTeatedi  + β2  ∗ AfteTi  + β3  ∗ TTeatedi  x AfteTi       + Ei

where,

yi  is the quarterly sales at each store

AfteTi      takes the value of 1 if it is after the intervention and 0, otherwise.

TTeatedi  is the binary variable that takes the value of one (for treatment) if the store is based in an Eastern suburb, and zero otherwise.

Table 3. Regression results of two models

Required:

Regarding the above case, please answer all of the following questions:

a)     Interpret 1  in model 1 (i.e. first regression) and what are the problems associated with interpreting it as causal. [max 180 words] (6 marks)

b)     Interpret  each  of the  estimated  coefficients  in  the  difference  in  differences regression in model 2 (i.e. second regression) from Table 3. What is the new estimate for the  treatment  effect  of the  GMO  labelling  change?  Provide  an example of an unobservable that can lead to biased estimates of the treatment effect in model 2. [max 250 words] (8 marks)

c)     Discuss two factors to consider enhancing the  internal validity of the  research experiment model. [max 250 words] (8 marks)

d)     Jack is suggesting revisiting his initial research design and has proposed to use Brisbane, the other city as a control group and keeping Sydney’s Eastern suburb as a treatment group. Describe how he would do this and critically evaluate the revised design of the experiment. [max 250 words] (8 marks)



热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图