program 代做、代写 java/Python 编程
Drs. J. Senk & P. Wijeratne: Applied Machine Learning @ University of Sussex – Spring 2025
Coursework Assignment
1 Assignment Overview
This assignment will involve you designing, building, testing and critiquing systems for two applied machine learning tasks.
1. Task 1 (50%): A system for performing spam detection, aka. classifying spam from non- spam in text.
2. Task 2 (50%): A system for performing face alignment, aka. locating facial landmarks in images of people.
This assignment is worth 100% of the grade for this module. It is designed to ensure you can demonstrate achieving the learning outcomes for this module, which are to:
• • • •
2
1.
Determine the applicability of different machine learning models to data found in real- world applications.
Propose designs for simple systems, including appropriate pre-processing, to solve practical problems using machine learning.
Implement and document a computer program that learns and applies machine learning models to realistic data.
Critically evaluate the efficacy of proposed systems and appropriately communicate this analysis.
What to hand in?
A report that comprises a maximum of 10 pages and 3000 words, including captions but excluding references. We expect several pictures, diagrams and flowcharts to be included. Please only use the .zip archive format for your submission.
The report should be written in two sections, one for each task. For each task, you should cover the following points. More detail is provided in Sections 3 and 4 below.
• A summary and justification for all the steps in your system, including preprocessing, choice of features and prediction model. Explaining the system diagrammatically is very welcome.
• Results of your experiments. This should include some discussion of qualitative (ex- ample based) and quantitative (number based) comparisons between different ap- proaches that you have experimented with.

Drs. J. Senk & P. Wijeratne: Applied Machine Learning @ University of Sussex – Spring 2025
• Examples of failure cases in your system and a critical analysis of these, identifying potential biases of your approach.
2. Either .ipynb files or .py files containing annotated code for all data preprocessing, model training and testing.
3. For Task 1: A csv file that contains the predicted labels on the test set of text, found in the csv file (spam detection test data.csv) here. You must use the provided “save as csv” function in the Colab worksheet to process an array of shape (number test data, 1) to a csv file. Please make sure you run this on the right data and submit in the correct format to avoid losing marks.
4. For Task 2: A csv file that contains the face landmark positions on the test set of images, found in the compressed numpy file (face alignment test images.npz) here. You must use the provided “save as csv” function in the Colab worksheet to process an array of shape (number test image, number points, 2) to a csv file. Please make sure you run this on the right data and submit in the correct format to avoid losing marks.
Note! Do not reorder the test data or it will not match up with the test labels / points!
3 Task 1: Spam Detection
3.1 Mark allocation
25 marks will be awarded for writing and presentation and 25 for coding and data analysis.
10 Marks
25 Marks
Accuracy and robustness of spam detection
These marks are allocated based on the performance of the spam detection method. This will be evaluated on the held out test set. The test data (without labels) are provided in the csv file (spam detection test data.csv) here and the error on the predicted labels will be calculated after submission. Marks will be awarded for average accuracy and robustness (based on the confusion matrix of your predictions). Note that only we have the test labels!
Outline of methods employed
Justifying and explaining design decisions for the spam detection. This does not have to be in depth, and we do not expect you to regurgitate the contents of the lecture notes/papers. You should state clearly:
• any text pre-processing steps you have used, and why.
• what text features/representation you have used, briefly describe how they were cal-
culated, and why you chose them.
• what predictions methods you have use; what ML task this corresponds to, the type of model that you have used, and the loss function that your system is trained with.
• design/parameter decisions should be explained and justified.
For top marks, you should clearly demonstrate a creative and methodical approach for designing your system, drawing ideas from different sources and critically evaluating your choices. Explaining using diagrams and/or flowcharts is very welcome.

Drs. J. Senk & P. Wijeratne: Applied Machine Learning @ University of Sussex – Spring 2025
15 Marks Analysing results and failure cases
Critically evaluate the results produced by your system on validation data. You should include quantitative (number based) and qualitative (example based) comparisons between different approaches that you have tried (on a held-out validation set).
3.2
Most important links
Quantitative measures include calculating the confusion matrix of your predictions. Please note that we are interested in your final prediction results, rather than how the cost function changes during training. Please explicitly define any evaluation metrics and ensure they are appropriate for the task.
Contents
Training text and label data
Test text data (without labels)
Colab worksheet with some useful functions
3.3 Where to start?
filetype links
csv file (spam detection training data.csv) link csv file (spam detection test data.csv) link Colab worksheet link
Text classiciation is covered in lecture 14, so that’s a good place to look for information. Other lectures (e.g., lecture 15) are also helpful.
We have included a very basic Colab worksheet illustrating how to load the data and print random text examples based on their labels. An example print-out is shown in Figure 1.
The simplest approach would be to treat this as a classification problem, where given text data you want to predict the whether or not it is spam.
To follow this approach you will need to consider what natural language pre-processing steps are necessary to obtain suitable features for your predictive model.
Figure 1: Example of non-spam text (label == 0) in the training dataset (spam detection training data.csv).

Drs. J. Senk & P. Wijeratne: Applied Machine Learning @ University of Sussex – Spring 2025
4 Task 2: Face Alignment
4.1 Mark allocation
25 marks will be awarded for writing and presentation and 25 for coding and data analysis.
10 Marks Accuracy and robustness of face alignment
These marks are allocated based on the performance of the face alignment method. This will be evaluated on the held out test set. The test images, without annotations are provided in the compressed numpy file (face alignment test images.npz) here and the error on the predicted points will be calculated after submission. Marks will be awarded for average accuracy and robustness (% of images with error below a certain threshold). Note that only we have the test points!
25 Marks Outline of methods employed
Justifying and explaining design decisions for the landmark finding. This does not have to be in depth, and we do not expect you to regurgitate the contents of the lecture notes/- papers. You should state clearly:
• any image pre-processing steps you have used, and why.
• what image features/representation you have used, briefly describe how they were
calculated, and why you chose them.
• what predictions methods you have use; what ML task this corresponds to, the loss function that your system is trained with, and a description of any regularisation that you may have used.
• design/parameter decisions should be explained and justified.
For top marks, you should clearly demonstrate a creative and methodical approach for designing your system, drawing ideas from different sources and critically evaluating your choices. Explaining using diagrams and/or flowcharts is very welcome.
15 Marks Analysing results and failure cases
Critically evaluate the results produced by your system on validation data. You should include quantitative (number based) and qualitative (example based) comparisons between different approaches that you have tried (on a held-out validation set).
Quantitative measures include measuring the cumulative error distribution (see lecture slides) or using boxplots or other plots to compare methods. Please note that we are interested in your final prediction results, rather than how the cost function changes during training. Please explicitly define any evaluation metrics and ensure they are appropriate for the task.
4.2
Contents filetype links
Most important links
Training images and points
compressed numpy array (face alignment training images.npz)
link
Test images (without points)
compressed numpy file (face alignment test images.npz)
link
Colab worksheet with some useful functions Colab worksheet link

Drs. J. Senk & P. Wijeratne: Applied Machine Learning @ University of Sussex – Spring 2025
4.3 Where to start?
Face alignment is covered in lecture 8, so that’s a good place to look for information. Other lectures (e.g., lecture 7) are also helpful.
We have included a very basic Colab worksheet illustrating how to load the data and visualise the points on the face. A visualisation of the average face and points across all training images is given in Figure 2.
The simplest approach would be to treat this as either a regular or a cascaded regression problem, where given an image you want to predict the set of continuous landmark coordinate locations.
To follow this approach you will need to consider what image features are helpful to predict the landmarks and what pre-processing is required on the data. Although you could directly use the flattened image as input, this will not be the optimal data representation for this task.
A better representation would be to describe a set of locations, either evenly spaced across the image, or in some more useful pattern (think about where in the image you might want to calculate more information) using a feature descriptor, such as Scale-Invariant Feature Transform (SIFT). These descriptions can then be concatenated together and used as input into a linear regression model. Note that you do not need to use the keypoint detection process for this task - rather the descriptors should be computed at defined locations (hint: look at sift.compute() or similar) to create a representation of the image that is comparable across the dataset.
Figure 2: Illustration of the 0-indexed (counting from 0 as you would in Python) locations of the points on the average face. For example, if we wanted to find the nose, that’s index 2 so we would look up points[2,:], which would give you the x and y coordinates.

Drs. J. Senk & P. Wijeratne: Applied Machine Learning @ University of Sussex – Spring 2025
5
General Points on the report
Read things! Provide references to anything you find useful. You can take figures from other works as long as you reference them appropriately.
Diagrams, flowcharts and pictures are very welcome! Make sure you label them properly and refer to them from the text.
All plots should have correctly labelled axis and the font sizes must be readable in A4 page format.
All figures (including plots) should have descriptive captions.
Notes on using Colab
6
• • • •
Either you can complete this project using Google Colab, which gives you a few hours of comput- ing time completely free of charge, or you can use your personal/lab machine. The lab machines are fairly powerful, so if you need more computing resource then try those!
If you are using Google Colab, try and familiarise yourself with some of its useful features.
To keep your saved models, preprocessed data etc. you can save it to Google drive following the instructions here. You can also directly download a file you make in Colab using the code below:
from google.colab import files files.download(filename)
If you refactor code into extra .py files, these should be stored in your google drive as well, or on Box such that they are easy to load into your Colab worksheet.
7 What software functionality can I use?
You are not allowed to use generative AI tools (e.g., ChatGPT, Deepseek, etc.) to solve these tasks or write your report.
You are not allowed to use library functions that have been written to directly solve the tasks you have been given, i.e. text classification and face alignment. You cannot use the dlib or mediapipe face alignment tools or anything that provides similar functionality. Also, face detection is not required on this data.
You are free to use fundamental components and functions from libraries such as NLTK, OpenCV, numpy, scipy, scikitlearn to solve this assignment, although you don’t have to. Here, fundamental components refers to things like regression / classification models and pre-processing / feature extraction steps and other basic functionality.
In terms of tools and frameworks, it’s absolutely fine to use convolutional neural networks (CNNs) or recurrent neural networks (RNNs) if you want to. The best packages would be either TensorFlow (probably with Keras) or PyTorch. If you use such an approach you should be sure to document how you chose the architecture and loss functions. A well justified and high performing deep learning approach will receive equivalently high marks as if you had built it any other way.
In terms of sourcing additional labelled data, this is not allowed for this assignment. This is because in real-world commercial projects you will typically have a finite dataset, and even if

Drs. J. Senk & P. Wijeratne: Applied Machine Learning @ University of Sussex – Spring 2025
there are possibly useful public datasets available, their license normally prohibits commercial use. On the other hand data augmentation, which effectively synthesises additional training examples from the labelled data that you have, is highly encouraged. If you use this, please try and add some text or a flow-chart of this process in your report.
8
• •
• • •

Top Tips for Success
Refer to lecture slides and labs - all the information is in there to complete these tasks!
Remember Occam’s razor: complexity should not be added unnecessarily. The more complicated your system the more things to explain/justify etc.
Start with a simple achievable goal and use that as a baseline to test against. Keep track of early models/results to use as points of comparison.
Remember that even if it doesn’t work well, having a go at both tasks is worthwhile. We’re only looking for simple solutions and your explanation of your system design.
Think about things that you have learned about in Applied Machine Learning. Dimen- sionality reduction could be helpful. Overfitting and outliers may be an issue, and you should consider using methods to minimise this.
For Task 2: You don’t need to work at very high resolution to get accurate results. Partic- ularly when doing initial tests, resize your images to a lower resolution images. Make sure you also transform your training points so they are in the same geometry as the image (i.e., if you half the size of the image along both axes, then make sure to half the (x,y) position of the training points too). For your predicted points, make sure these are all at the same resolution as the original images.


热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图