Resit Coursework - EMATM005/SEMATM36/SEMAM0014 Large Scale Data Engineering [ Data Science, FinTech and EwDS cohorts]
Version: 24.02.2024 v1.1
Changes:
24.02.2025 v1.1 – Initial release
Summary
This coursework is divided into two parts:
Task 1: A written task (only) to design architecture of a simple application on AWS cloud, where you are required to have a deep understanding of AWS services and how they work together within an application. The design should demonstrate your knowledge of AWS services covered throughout the entire LSDE course.
Task 2: A combined practical and written activity architecting a scaling application on the Cloud, where you will be required to use knowledge gained and a little further research to implement the scaling infrastructure, followed by a report that will focus on your experience in the practical activity together with knowledge gained in the entire LSDE course.
Weighting: This assessment is worth 100% of your total unit 20 credits.
Please note that the Category of generative AI use in assessment for this assessment falls in
Category 2: Minimal – for example, using spell and grammar checkers to help identify mistakes but
not rewrite chunks of text. More information is available onhttps://www.bristol.ac.uk/bilt/sharing- practice/guides/guidance-on-ai/using-ai-in-assessment/.
Please note that all information shown in the screenshots must be in English. The screenshots will be considered invalid if they include any text in non-English languages
Pre-requisites:
• You must have completed the AWS Academy Cloud Foundations course set in weeks 1-8
• You will require an AWS Academy Lab account for the practical activity. You should receive an invite when this document is released. Please contact the LSDE Unit Director if you have no invitation email or are having issues with the registration.
• A Secure Shell (SSH) client, such as MacOS Terminal or PuTTy on Windows, for server admin.
Submission:
Via the LSDE BlackBoard coursework assessment page, submit one .pdf file (named ‘your_username.pdf, e.g. tl18303.pdf), containing:
• A report (‘report.pdf’) in PDF format containing:
o Task 1
o Task 2
o Your AWS Academy account credentials (username, password)
In this document we provide a detailed explanation of the tasks and the approach to marking.
Task 1: (25%)
In Task 1, we require you to design the architecture for an application, “Voice2Text”, running on AWS cloud. This is a service which allows your registered users to convert their audio file to text. The application needs to meet the following requirements:
• The application works mainly for the registered users who are in European, not globally.
• The application should work with low latency and highly availability
• The audio data uploaded by users can be stored. The stored audio data cannot be accessible by other users.
• The results of the text will be automatically sent to the users by email.
You should include your own descriptions of the following, 500-800 words and no more than 2 A4 pages:
• List the AWS services used in your design and explain in detail how these services work to ensure the high-performance, security and cost-efficiency in this application.
• Use a diagram to demonstrate the architecture of this application, especially for showing
how AWS services interaction and your network design. Please also describe how this application works when one of your users uploads an audio file and expect to get the result via email.
You don’ t need to implement these ideas in your lab account.
Task 2: Scaling the WordPress Application (75%)
Write a report of no more than 2500 words and 18 A4 pages (there is NO minimum), including: Task A, B, C, D, E.
Overview
WordPress is by far the most popular open-source software for hosting online blogs and small-scale websites. It is a PHP application, backed by a MySQL database (NOTE: you are NOT expected to understand or modify the source code in any way).
WordPress includes a password-secured browser admin interface that enables blog posts and other content to be created, management of users, review of blog metrics, installation of extensions (known as ‘plugins’), and so on.
WordPress is typically installed on a single EC2 server, but as we saw in the Cloud Foundations course, a single server has limitations in availability, scalability, performance, etc. This can affect the speed of response (latency) and thus performance and cost (see this article).
Your task will be to take a default, minimal installation of WordPress and implement a resilient, highly available, scalable, cost effective and secure architecture for it on AWS. This will include performing load testing on your application to demonstrate improved performance under stress.
You will be required to initially set up and test the application, using instructions given with the zip download file. You will then need to identify how to scale and improve the application architecture, based on principles learned in the CF course. Finally, you will write a report covering this process, along with some extra material.
Task A – Install the Application
Ensure you have set up access to your AWS Academy Lab account and have at least $10 credit (you are provided with $50 to start with). If you are running short of credit, please inform. your instructor.
Refer to the WordPress installation instructions in the coursework.zip download on the BlackBoard site, to install and configure the application in your AWS Academy Lab account. These instructions do not cover every step – you are assumed to be confident in certain tasks, such as in the use of IAM permissions, launching an EC2 instance, etc.
You will set up a single server installation of WordPress, using a pre-built community AMI, then configure it appropriately for this assessment.
Before moving on to the next task, ensure that:
• You can access the WordPress administration interface and can create & manage blog posts.
• You have the required plugin(s) installed and configured.
• You have successfully set up SSH (command line) access to the WordPress instance.
• You have successfully set up the load testing site and run some trial load tests.
You will need to 1) give a brief summary of how the application works (without any reference to the code functionality) in this Task; 2) give details of any issues you had and whether you resolved in this Task.
[NOTE: The application and plugin code are programmed in the PHP language. You are NOT expected to understand or modify it. Any code changes will be ignored and may lose marks.]
Task B - Design and Implement Auto-scaling
Review the architecture of the existing application. Although the website is usable for one visitor (client), when run under the load tester for multiple clients the response (latency) becomes noticeably slow (5000ms / 5 seconds or more to load a page).
To better handle multiple clients, we need to add scaling to the application. This should function as follows:
• When a given maximum performance metric threshold is exceeded, an identical WordPress instance is launched (up to 3 instances) and begins to also respond to incoming requests.
• When a given minimum performance metric threshold is exceeded, the most recently launched WordPress instance is removed (terminated).
• There must always be at least one WordPress instance available to respond to incoming requests when the WordPress website architecture is 'live'.
Using the knowledge gained from the Cloud Foundations course, you need to architect and implement auto-scaling functionality for the WordPress application. Additionally, you need to identify a CloudWatch performance metric to use for the ‘scale out’ and ‘scale in’ rules and provide sufficient reasons. It’s wise to review CloudWatch metrics for the EC2 service after running the application for a while under load, to pinpoint the most appropriate metric. You can refer to the Lab 6 in Module 10, which is also for a web application.
[NOTE: The free version loader.io only provide 1mins test, so you could manually run tests continuously to show the performance of auto-scaling.]
Task C - Perform Load Testing
Once you have set up your auto-scaling infrastructure, test that it works. Set (edit) the test in the load tester to use initially 250 clients, then 350 clients, then 500 clients.
If your autoscaling functionality is configured correctly, you should be able to achieve a latency response of about 1 second with 500 clients. If the load tester produces an error during testing, the response time is too high and you will need to fine tune your auto scaling parameters more.
Please watch and record the following behaviours and illustrate all loading tests done for optimising auto-scaling:
• Watch the behaviour of your WordPress application, to check the scale out (add instances) and scale in (remove instances) behaviour works.
• Take screenshots of the EC2 instance page showing launched / terminated instances along with the load tester graphs.
• Try to optimise the scaling operation so that instances are launched quickly when required and terminated soon (but not immediately) when not required. Note settings you used and the fastest processing time you can achieve.
• Try using a few different EC2 instance types – with more CPU power, memory, etc. Note down any changes in processing time.
[NOTE: You are expected to list all the configurations of scaling policies and instance types and test the impact of these configurations on the performance of your auto-scaling. You should record the time for each experiment and discussion about how to optimise the operation is required.]
[NOTE: Academy Lab accounts are limited in which EC2 Types and services they can use.]
Task D - Secure and Optimise the WordPress Architecture
Based on only AWS services and features learned from the Cloud Foundations course, describe how you could re-design the WordPress application’s current cloud architecture (i.e. not changing the application’s functionality or code) to improve the architecture in the following areas:
• Increase resilience and availability of the application against component failure.
• Long-term backups of valuable data required.
• Cost-effective and efficient application for occasional use. Processing does not need to be immediate.
• Prevent unauthorised access.
Your description should ideally include a diagram and include the AWS services required together with a high-level explanation of features & configuration for each requirement.
[NOTE: You are NOT expected to modify the Web application.] You don’ t need to implement these ideas in your lab account.
Task E – Challenges
Basic: A newly added blog post may not always appear in a scaled architecture with multiple running instances. Could you briefly explain the reason for this issue?
High-level: Based on the services and frameworks covered in the LSDE course, describe step by step how you could resolve this issue using a relevant scalable, highly available, managed service of your choice.
[NOTE: Do not implement this idea, just explain the basic workflow of configuring this for your WordPress architecture]
Task F - Create the Final Report
Combine Task 1 and Task 2 to a singIe PDF. You will also need to give us Your AWS Academy account credentials (username, password) at the end of your report.
The report shouId be a singIe PDF. It does not need to foIIow any specific format, but you shouId use grammar and speIIing checkers on it and make good use of paragraphs and sub-headings. DoubIe- spacing is not required. Use diagrams where they make sense and incIude captions & references from the text.
[IMPORTANT: All text not originally created by you must be cited, leading to a final numbered reference section (based on e.g. the British Standard Numeric System) to avoid accusations of plagiarism.]
[IMPORTANT: Disable autoscaling at end of each lab session: 一 Desired capacity = 0 ; Minimum capacity = 0. This saves credit and avoids multiple instances from launching and terminating when starting / stopping a lab session]