代做Analysis Data Pipeline for Danish Electricity and Gas supply 2025/2026帮做Python编程-留学生作业帮

代做Analysis Data Pipeline for Danish Electricity and Gas supply 2025/2026帮做Python编程

SCHOOL OF INFORMATICS & IT

DIPLOMA IN BIG DATA & ANALYTICS

AY 2025/2026 April Semester

Project (60%)

Analysis Data Pipeline for Danish Electricity and Gas supply

DATA ENGINEERING IN THE CLOUD (CDA2C06)

SUBJECT LEVEL: 2

GENERAL INSTRUCTIONS

1. This document consists of 13 pages (including cover page and marking rubrics).

2. Please complete ALL tasks in this assignment.

1. Background

Energinet is the independent public enterprise responsible for Denmark’s transmission system for electricity and gas. Through its commitment to transparency and the green transition, Energinet provides publicly accessible energy datasets via its Energy Data Service platform. (https://en.energinet.dk/). These datasets support national and international energy research, system planning, market analysis, and innovation in energy technologies.

The Energinet Data Service – API provides programmatic access to a wide range of datasets including electricity spot prices, consumption patterns, production data by source (wind, solar, fossil), emissions, grid balance, gas consumption, and more. These datasets are particularly valuable for analytics projects involving energy forecasting, market behavior. analysis, and sustainability impact studies.

The platform. aligns with the open data policies of the Danish Government and the EU, promoting the use of open data in fostering data literacy, transparency, and innovation. Source:https://en.energinet.dk

2. Data

2.1 Data Source

The dataset for this project will be drawn from Energinet’s "Electricity Production and Exchange 5 min Realtime" dataset, available through the Energy Data Service.

https://www.energidataservice.dk/tso-electricity/ElectricityProdex5MinRealtime

This dataset provides high-frequency, near real-time electricity production data in Denmark at 5-minute intervals. It includes breakdowns by energy production type (e.g., wind, solar, thermal) and tracks international electricity exchanges with neighboring countries.

This dataset is ideal for students interested in:

• Monitoring fluctuations in electricity generation by source

• Studying the real-time balance between domestic production and cross-border electricity exchanges

• Analyzing renewable energy contribution in near real-time

• Building time-series models to forecast short-term grid behavior.

It serves as a rich source for data engineering workflows and real-time analytics and is updated frequently to reflect live operational data from Denmark’s national grid.

2.2 Data Format

The data is accessible via a RESTful API that supports GET requests and returns results in JSON format, making it suitable for ingestion by AWS services or analytics platforms using Python, R, or SQL-based engines like Athena.

API Explorer by Energinet Data Service:

Figure: 2.1 (screenshot of 2 days Electricity Production and Exchange in 5 mins interval)

API requests can be customized using parameters such as:

• start and end (to filter by datetime range)

• filter (to target specific production types or bidding zones)

• sort and time-zone options

Sample API Request (by Postman):

Figure: 2.2 (screenshot from Postman, customized request with offset, start, end and sort parameters)

The returned JSON object contains fields such as:

• Minutes5UTC – timestamp in UTC

• PriceArea – DK1 or DK2 market zone

• Production – electricity production in MW by source (e.g., solar, wind onshore/offshore, thermal)

• Exchange – electricity flow (import/export) to/from neighboring countries

This structured format enables students to automate data ingestion, catalog metadata using AWS Glue, and query with Amazon Athena for exploratory and analytical tasks.

3. Tasks

You are part of Energinet data engineering team, and are tasked with evaluating, designing, and building an AWS data pipeline as a proof-of-concept.

You are required to use Energinet’s "Electricity Production and Exchange 5 min Realtime" dataset. You are now to build a data engineering solution in your AWS Learner Lab using many AWS services that are familiar with. By working with this data source, you will be able to test whether the solution that you build can support a much larger dataset in actual implementation.

The object of this project is to achieve a sustainable and seamless data synchronization and better front-end data service for data consumption. Prior to building a robust and reliable solution, you shall start with an architecture design proposal (which is considered as a Project Report) and viable prototype (which is considered as a Project Solution) in this context. The aim of the prototype is to present the Proof of Concept (POC) to judge the feasibility for actual implementation.

This project will challenge you to do the following:

Basic Requirements

Using AWS Cloud9 integrated development environment (IDE) instance.

Collect and ingest the data from the web source.

Store the data in Amazon S3 and create a data catalogue.

Create an AWS Glue crawler to infer the structure of the data, transform the data to be in human readable format (such as CSV).

Use Amazon Athena to query the data and create the views for analysis purpose.

Create an analysis dashboard in relevant visualization platform.

Project completed within allocated budget (capped to $50 USD).

Advanced Requirements (In addition of Basic Requirements)

Data Wrangling (further data processing using boto, AWS Data Wrangler and other relevant Python libraries)

Orchestration and Deployment (using API and Step functions)

Monitoring and Notifications

Project running costs optimally managed with no wastage, based on services implemented vs cost (Not based on current usage amount, but by method of implementation, for example, 100 employees reading the same file VS 100 employees create 100 files to read).

Refer to the Project Cost Estimate Report for the services recommended for this project here:

4. Deliverables

There are TWO deliverables for this project namely,

Project Report with Group Presentation (for due date, refer to Teaching Plan)

Project Solution and Individual Presentation (for due date, refer to Teaching Plan)

4.1 Project Report – (Group 10%, PDF, 20 marks)

Form a group with 4 – 6 members.

Prepare a Project Proposal Report (in PDF format) that details below and stating contributing member for each section:

4.1.1 Architecture Design (10 marks)

• Identify business requirements: List the business requirements for the proof-of-concept, such as enabling the data science team to perform SQL analysis and managing data access based on job roles. Start by considering the four main parts of the pipeline (ingestion, storage, processing, serving) and expand from there.

• Select relevant components: Identify the components that meet these requirements.

• Justify component choices: Provide explanations for why each component was chosen and how it contributes to the intended data pipeline.

4.1.2 Configuration Checklist (10 marks)

• Recommend configurations: Based on the identified components, recommend necessary non-default configurations for each service, such as enabling versioning for an S3 bucket.

• Justify configurations: Explain the reasons for each recommended configuration.

• Address access control and security: Include configurations for access control management (such as IAM, users, roles, and policies) and ensure configurations are optimized for threat prevention, data integrity, and compliance. Due to the limitations of the IAM module in the project prototyping environment, keep the recommended configurations for future reference.

4.2 Group Presentation (Group 10%, 10 marks)

All team members are required to present:

• Identified Business Requirements

• Configuration Checklist

• Question & Answer

4.3 Project Solution – (Individual 30%, PDF, 60 marks)

Prepare a Project Solution Report detailing below:

4.3.1 Basic Requirements

1. Data Ingestion (to acquire data from JSON endpoint using API)

Suggested service/platform: AWS Cloud9 IDE or AWS Lambda. You may use either one of them to access the data directly from the JSON endpoint.

Be mindful of Python version and its compatibility with the required libraries.

Consider creating a layer in AWS Lambda to keep all the library dependencies in a single repository.

Be careful of the nested JSON structure and read the data accordingly.

Test your Python code in Jupyter Notebook first (especially for the iteration logic and conditional statements)

2. Data Storage (to save data in temporary storage for processing)

Suggested service/platform: Amazon S3 can be considered for temporary storage. During the precedent data ingestion, structuring the data into tabular format (rows and columns) is required.

Be reminded of a landing zone concept during data collection. [Refer to L02a.Modern Data Architecture Infrastructure lecture taught in Week 2]

You may consider saving the acquired data in the cross-platform readable format (such as CSV, TXT, XLSX).

3. Data Process (to prepare table schema for usable data formats)

Suggested service/platform: AWS Glue can read the data from a data store and recognizes the format and allocate the schema with appropriate data types.

You may still need to adjust the data type for some columns manually (because STRING is the default type in AWS Glue).

4. Data Serving (to provide the data in analytic-ready state)

Suggested service/platform: Amazon Athena could work with most of the analytics platforms ranging from visualization tools to machine learning models. Once the data discovery was conducted by AWS Glue, you should have the tabular data with proper data types.

Athena would be serving as the connector between your data and any analytics platform.

Be reminded of setting the output folder in S3 bucket for the processed data.

[Refer to P03.Querying Data in the Cloud taught in Week 3]

Depending on your creativity, data aggregations, grouping, filtering could be done in Athena to reduce the cost of scanning the whole dataset from each time you run a query.

5. Data Analysis: (to query and analyse data assets in place)

Suggested service/platform: Use Appropriate visualization platform. (such as Power BI or Tableau) to integrate with the AWS data sources.

You may consider connecting to Amazon Athena directly from self- service BI.

You may need to install ODBC Data Sources locally with

necessary driver to connect to AWS services from your computer.

4.3.2 Advanced Requirements

6. Data Wrangling (enhance the data quality, aggregation, and analysis)

Notes: Out of four components of data wrangling you may ignore below two.

Structuring (given that the data has been prepared and formatted in Amazon Athena and they are interpretable)

Normalizing and de-normalizing (since you are only focusing at the minimum level of segregation such as countries and species)

Suggested service/platform: Amazon S3, AWS Data Wrangler and Jupyter Notebook

You are encouraged to work on:

Cleaning: explore and validate raw data from their messy state

and complex forms into high-quality data with the intent of making them more consumable and useful for analytics.

This includes tasks like standardizing inputs, deleting duplicate values or empty cells, removing outliers, fixing inaccuracies, and addressing biases. Remove errors that might distort or damage the accuracy of your analysis.

Enriching: transform. and aggregate the current data to produce valuable insights and guide business decisions. Once you've

transformed your data into a more usable form, consider whether you have all the data you need for your analysis. If you don't, you can enrich it by integrating with values from other datasets. You also may want to add metadata to your database at this point.

7. Deployment (to assess the log files periodically and alert the system admin). Orchestrating the multiple services and components is a good practice for DataOps. Package the services you used in the project using STEP functions and automate the flow with scheduling for seamless integration with minimal human intervention.

Suggested service/platform: AWS STEP function, Amazon EventBridge AWS Lambda

8. Monitoring (to assess the log files periodically and alert the system admin).

Understanding the performance metrics and continuous improvement to data pipeline is part of Data Engineering works. Observe the performance ofcomponents and make necessary adjustments to the configurations to make our data workflow efficient and reliable.

Suggested service/platform: Amazon CloudWatch, CloudTrail, SNS

You may consider assessing log files often using the built-in

monitoring tool (Amazon CloudWatch) which is well integrated with most components and generate the logs. For some components,

the utilization and performance metrics are available with useful charts. Use it accordingly to report the performance issues and possible fine-tuning.

You may also integrate SNS together with CloudWatch to trigger the notifications to the system administrator in case of the errors and unexpected events.

4.4 Presentation – Individual (10%, 10 marks)

Your demonstration for the project implementation should meet the following criteria:

4.4.1 Detailed explanation of the data pipeline including, but not limited to

Ingestion Layer

Storage Layer

Processing Layer

4.4.2 Demonstrate the collection, ingestion, serving and analysis of your prototype to convince CIO to consider for actual implementation.

4.4.3 In addition, you may also highlight the possible vulnerabilities and security concerns for your data pipeline and how your configuration can mitigate them to lower the risk.

5. How the Project is assessed

Refer to the Appendix A (Page #10) for marking rubrics in detail.

First Submission – Project Proposal Report (Group) 10%

Font – Times New Roman

Font Size – 11

Format: – PDF

Page Limit – 20 maximum including the references.

Template: TP-LMS – DAEC – Assessment – DAEC Project Template.docx Submission – Refer to the Teaching Plan

Group Presentation (10%)

Duration – Each group is allowed 20 minutes to explain the architecture proposal and configuration checklist.

Submission – The presentation will be scheduled during the timetabled lesson. Your tutor will inform. the venue and schedule nearer to the date.

Second Submission – Project Solution Report (Individual) – 30%

Development Environment – Use AWS Academy (Learner Lab Environment) for project prototyping. $50 credit points will be provided to use the AWS Services appropriate for this project.

Font Size – 11

Content – Explanations with relevant screenshots for all works done in Section 4.3 Format: – PDF

Page Limit – 50 maximum including the references.

Template: TP-LMS – DAEC – Assessment – DAEC Project Template.docx Submission – Refer to the Teaching Plan

Individual Presentation (10%)

Duration and scope – Allowed 10 minutes for each student to demonstrate how data pipeline and workflow is implemented for prototype and address any questions raised.

Late submissions

Penalty will be awarded to late submissions:

• Late and < 1 day – 10% deduction from absolute marks given for the part of the work e.g. if the assignment was worth 100 marks, you were given 75 marks for the work, after the penalty you are left with 65 marks i.e. 75 – 100x10%

• Late >= 1 and < 2 days – 20% deduction from absolute marks

• Late >= 2 days – no marks will be awarded

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030

联系我们

EMail: 99515681@qq.com

QQ: 99515681

留学生作业帮-留学生的知心伴侣！

工作时间：08:00-21:00

微信客服：codinghelp

热门主题

课程名