代做STATS 3DA3 Homework Assignment 2代做留学生Python程序

STATS 3DA3

Homework Assignment 2

Instruction

Due before 10:00 PM on Tuesday, February 11, 2025.

Submit a copy of PDF with your solution to Avenue to Learn. You don’t need to write the questions in your answers.

Late Penalty for Assignments: A 15% penalty will be applied for each day an assignment is submitted after 72 hours past the due date (rounded up).  This includes accommodations for extended time through SAS.

Assignments submitted after 72 hours will receive a grade of zero.

•  Your assignment must conform to the Assignment Standards listed below.

Assignment Standards

•  Write your name and student number on the title page. We will not grade assignments without the title page.

•  Quarto Jupyter Notebook is strongly recommended.

•  Eleven-point font  (times or similar) must be used with  1.5 line spacing and margins of at least 1~inch all around.

•  Use newpage to write solution for each question (Question 1, 2, 3).

•  No screenshots are accepted for any reason.

•  The writing and referencing should be appropriate to the undergradaute level.

•  You may discuss homework problems with other students, but you have to prepare the written assignments yourself.

•  Various tools, including publicly available internet tools, may be used by the instructor to check the originality of submitted work.

•  Assignment policy on the use of generative AI

–  Generative AI is not permitted in the assignments, except for the use of GitHub Copilot as an assistant for coding.

Clearly indicate in the code comments where GitHub Copilot was used as a coding assistant.

In alignment with McMaster academic integrity policy, it “shall be an offence knowingly to submit academic work for assessment that was purchased or acquired from another source”.   This includes work created by generative AI tools.   Also state in the policy is the following, “Contract Cheating is the act of”outsourcing of student work to third  parties” with or without payment.” Using Generative AI tools is a form. of contract cheat- ing.  Charges of academic dishonesty will be brought forward to the Office of Academic  Integrity.

For all the questions, use Python 3.11.5 and virtual environment. Then, install the required libraries for text mining and Shiny visualization.

Question 1:  Word Cloud Analysis

Let’s explore the article “Data Science and Engineering With Human in the Loop, Behind the Loop, and Above the Loop” by Xiao-Li Meng  (2023).  Follow the steps below to create and analyze a word cloud for pages 2–5 of the article.

(1)  Add the article “Data Science and Engineering With Human in the Loop, Behind the Loop, and Above the Loop” by Xiao-Li Meng (2023) to your reference list.

(2)  Download the PDF of the article.

Hint:

•  Access the article via https://doi.org/10.1162/99608f92.68a012eb.

•  Click the Download button in the top-right corner, and choose the PDF format.

•  Move the downloaded file to your working folder and rename it as paper.pdf.

(3)  Use pdfplumber.open() to open the PDF.

(4)  Extract the text from pages 2 to 5.

(5)  Combine the text from these pages into a single string.

(6)  Split the string by lines using \n.

(7)  Create a pandas data frame named df with a column labeled line containing the split lines.

(8)  Break each line into individual words.

(9)  Convert each word into a separate row in the data frame.

(10)  Convert all words to lowercase.

(11)  Remove stop words.

(12)  Remove unsuitable words using the following steps:

Hint:

(i)  Remove rows where the word column contains punctuation using

str.contains(r'[,. •‘”“:’;\(\)\[\]]', regex=True)]

(ii)  Remove  rows  where  the  word column  contains  numbers  using:    -  str.contains(r'\d', regex=True)]

(iii)  Remove rows where the word column contains single letters using: - str.contains(r'^[a-z]$', regex=True)]

(13)  Create a term-frequency data frame.

Hint:

(i)  Calculate the frequency of each unique word using:  value_counts().reset_index()

(ii)  Save the result in a DataFrame called freq.

(14)  Generate a word cloud for the most frequently occurring words (e.g., the top 10 words).

(15)  Write a summary paragraph (at least two statements) about your word cloud.  The summary

can include any limitations of your analysis and provide context based on the chosen text.

Question 2

Greenhouse gases (GHGs) play a significant role in global warming by capturing and retaining solar  heat energy, leading to elevated global temperatures.  In 2004,  Canada launched the Greenhouse  Gas  Reporting  Program  (GHGRP) to  monitor  and record emissions  from  facilities that release  10 kilotonnes or more of greenhouse gases, measured in CO2-equivalent units.  Facilities meeting  this threshold are required to submit annual reports to Environment and Climate Change Canada. The dataset is publicly accessible through Canada’s Open Government Portal: Greenhouse  Gas Reporting Program (GHGRP) - Facility Greenhouse Gas (GHG) Data.

For  Question  2,   we  have  downloaded  the  dataset PDGES-GHGRP-GHGEmissionsGES-2004- Present.csv from the portal.

This analysis focuses on creating a Shiny App to explore trends in greenhouse gas emissions across Canada’s provinces and territories, measured in CO2-equivalent units.

Data dictionary:

The  dataset,  spanning  from  2004  to  the  present,  includes  emissions  data  (in  tonnes  and  CO2-

equivalent  tonnes)  for  each  facility,  categorized  by  gas  type,  including  carbon  dioxide  (CO2), methane (CH4), nitrous oxide (N2O), hydrofluorocarbons (HFCs), perfluorocarbons (PFCs), and  sulphur  hexafluoride  (SF6).   It  also  provides the  province  or territory where  each  facility  is  lo- cated.  For further details, refer to the Greenhouse  Gas Reporting Program  (GHGRP) - Facility

Greenhouse Gas (GHG) Data.

Pre-Processing Steps

To simplify the task of creating a Shiny App, we have pre-processed the data as follows: We start by importing the necessary libraries for data transformation:

import numpy as np

import pandas as pd

import re

Next, we read the downloaded dataset in CSV format with the specified encoding (latin1):

df = pd.read_csv("GHG_Emissions.csv", encoding='latin1')

The column names in the dataset are a mix of English and French. We use the clean_column_names() function to standardize the column names by removing French names, non-ASCII characters, and    unnecessary symbols.

Here is the clean_column_names() function:

# clean_column_names function

def clean_column_names(column_names):

cleaned_names = []

# loop through each column name

for name in column_names:

# convert names to ASCII and remove non-ASCII characters

name = name.encode('ascii', 'ignore').decode('ascii')

# remove everything after '/' (French column name)

name = re.sub(r'/.*', '', name)

# remove parentheses

name = re.sub(r'[()]', '', name)

# remove extra whitespace

name = ' '.join(name.split())

cleaned_names.append(name)

# return new column names

return cleaned_names

We then apply this function to clean the column names in the DataFrame.

df.columns = clean_column_names(df.columns)

Next, we select the relevant columns for the analysis:

•  Reference Year - the year GHG gas emission was recorded.

•  GHGRP ID No. - the facility identity.

•  Facility Province or Territory - province or territory of the facility.

•  CO2 tonnes - emissions (in tonnes and tonnes of CO2 eq.) of carbon dioxide (CO2).

•  CH4 tonnes - emissions (in tonnes and tonnes of CO2 eq.) of methane.

•  N2O tonnes - emissions (in tonnes and tonnes of CO2 eq.) of nitrous oxide.

•  SF6 tonnes- emissions (in tonnes and tonnes of CO2 eq.) of sulphur hexafluoride.

•  HFC Total tonnes CO2e - emissions  (in tonnes and tonnes of CO2 eq.)  of hydrofluorocar-bons.

•  PFC Total tonnes CO2e - emissions (in tonnes and tonnes of CO2 eq.) of perfluorocarbons.

selected_cols = [

"Reference Year", "GHGRP ID No.", "Facility Province or Territory",

"CO2 tonnes", "CH4 tonnes", "N2O tonnes", "SF6 tonnes",

"HFC Total tonnes CO2e", "PFC Total tonnes CO2e"

]

df= df[selected_cols]

We rename the columns to make them more concise and consistent:

df.rename(columns={

"Reference Year": "Year",

"GHGRP ID No.": "Facility_ID",

"Facility Province or Territory": "Province_Territory",

"CO2 tonnes": "CO2",

"CH4 tonnes": "CH4",

"N2O tonnes": "N2O",

"SF6 tonnes": "SF6",

"HFC Total tonnes CO2e": "HFC",

"PFC Total tonnes CO2e": "PFC"

}, inplace=True)

print(df.head())

Finally, we save the pre-processed data to a new CSV file:

df.to_csv("cleaned_GHG_Emissions.csv", index=False)

The pre-processed dataset is now available for analysis and can be accessed at:

https://raw.githubusercontent.com/PratheepaJ/datasets/refs/heads/master/cleaned__GHG__Emissions.csv.

You will use this dataset for Question 2.

Next Steps

The following questions guide you through creating a Shiny App to explore trends in CO2, CH4, and N2O emissions across provinces and territories in Canada from 2004 to 2022.

(1)  Read the pre-processed data from the provided link.

(2)  Ensure that the year variable is in the correct format.   If not, convert it to the date-time

format and extract the year. Replace the original ‘Year’ variable with the extracted year.

Hint: Use the following command to convert the year:

to_datetime(df['Year'], format='%Y').dt.year

(3)  Some  territories  may  have  no  facilities  reported  in  early  years.   Group  the  data  by  Year and Province_Territory to count distinct Facility_ID values.  Find which territories are missing in 2004.

Hint: Use the following code to group the data and find missing territories:

df.groupby(['Year', 'Province_Territory']).agg(

facilities=('Facility_ID', 'nunique')

).reset_index()

(4)  Find the earliest and latest year emissions were recorded.

(5)  Group the data by Year and Province_Territory and sum the emissions of CO2, CH4, and N2O for each province.

Hint: Use the following code to calculate the total emissions:

df.groupby(['Year', 'Province_Territory']).agg(

CO2=('CO2', 'sum'),

CH4=('CH4', 'sum'),

N2O=('N2O', 'sum')

).reset_index()

(6)  Plot the CO2 changes over the years for each province and territory, using colored lines to

differentiate between them.

Note: you will use the dataset obtained in (5) for this plot.

(7)  Provide a description of the CO2 emission trend across provinces and territories based on the plot in (6).

(8)  Develop a Shiny app that allows the user to input a start year (from 2004 to 2022), an end year (from 2004 to 2022), and select a gas type (CO2, CH4, N2O).

•  Use ui.input_select to allow the user to specify the start year  (between  2004  and 2022).

•  Use ui.input_select to allow the user to specify the end year (between 2004 and 2022).

•  Use ui.input_select to allow the user to select the gas type (CO2, CH4, or N2O).

You can start by using the following Shiny app template to structure your app. When writing the app in app.py, remove the template instructions and replace them with your implementation.

You will also need to copy-paste your app.py in your assignment answers, similar to the template provided here:

# load the required libraries

# define the UI for the Shiny app

app_ui = ui.page_fluid(

ui.input_select(

id='emissiontype'

label='Choose emission type',

# Add more gases as necessary in ...

choices=['CO2', '...', '...'],

selected='CO2'

),

ui.input_select(

"start_year",

"Start Year",

[str(year) for year in range(2004, 2023)]

),

ui.input_select(

"end_year",

"End Year",

[str(year) for year in range(2004, 2023)]

),

ui.output_plot('myplot')

)

# define the server function for the Shiny app

def server(input, output, session):

@output

@render.plot

def myplot():

# Read the pre-processed data

# from the provided link

df = ...

# Convert 'Year' column to date-time

# format and extract the year

df['Year'] = ...

# Filter data based on the

# selected start and end year

start_year = int(input.start_year())

end_year = ...

df = df[(df['Year'] >= start_year)

& (df['Year'] <= end_year)]

# Select the emission type based on user input

emission_type = input.emissiontype()

# Create a plot to visualize the emission trends

plt.figure(figsize=(10, 6))

sns.lineplot(data=df,

x='...',

y=emission_type,

# Color lines by province/territory

hue='...',  marker='o')

# complete the title with your choice of text

plt.title(f'{emission_type} ...')

plt.xlabel('...')

plt.ylabel(f'Total {emission_type} Emissions (Tonnes)')

plt.legend(title='Province/Territory',

bbox_to_anchor=(1.05, 1), loc='upper left')

plt.xticks(ticks=np.arange(df_filtered['Year'].min(),

df_filtered['Year'].max() + 1, 1),

rotation=45)

plt.grid(True)

return plt.gcf()

# Run the app

app = App(app_ui, server)

(9)  Deploy your Shiny App at https://www.shinyapps.io/.  Then, provide the link to the App. For example, the link to my app is https://pratheepaj.shinyapps.io/my_app/.




热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图