代写COMP4139 Machine Learning Assignment 2代做留学生Python程序

COMP4139 Machine Learning

Assignment 2

Machine Learning for Breast Cancer Treatment

Response Prediction

1. Introduction

This assignment assesses your practical skills in applying machine learning methods to a real-world problem. The implementation will be based on Python and third-party Machine Learning libraries. Same as assignment 1, you must work in the same group and submit your work by 12th  December 2025 at 3 pm UK time on Moodle by member 1 of each group. You can split and distribute the work to individual members, but each individual is expected to understand every aspect of the work.

2. Background

Breast cancer is the most common cancer in the UK for women. Chemotherapy is a commonly used treatment strategy to reduce the size of locally advanced tumours  before  surgery.  However,  chemotherapy  is  a  toxic  process  to the human body and it is not always effective for everyone. Complete tumour resolution at surgery, known as pathological complete response (PCR), has a high likelihood of achieving a cure and longer relapse-free survival (RFS) time. RFS is the length of time after primary treatment for cancer ends that the patient survives without any signs or symptoms of that cancer. However, only 25% of patients receiving chemotherapy will achieve a PCR, with the remaining 75% having residual disease and a range of prognosis. Better patient stratification and treatment could be achieved if PCR and RFS could be predicted using information prior to chemotherapy treatment.

3. Aim

You are asked to use advanced machine learning methods to predict PCR (classification) and RFS (regression) using both clinically measured features and  features  derived  from   magnetic   resonance   images   (MRI)  prior  to chemotherapy treatment.

4. Data

Based on the public dataset from The American College of Radiology Imaging Network (I-SPY 2 TRIAL), a simplified dataset is generated for this assignment.

Each patient in this dataset contains 11 clinical features (Age, ER, PgG, HER2, TrippleNegative Status, Chemotherapy Grade,  Tumour Proliferation, Histology Type,   Lymph  node  Status,  Tumour Stage and Gene) and  107  MRI-based features. The image-based features were extracted from the tumour region of MRIs using a radiomics feature extraction package (known as Pyradiomics: https://pyradiomics.readthedocs.io/en/latest/ ). You do not need to understand the meaning of these clinical features and image-based features to complete this assignment but worth reading background information on the I-SPY 2 Trial website. “999” in the spreadsheet means a missing data value. A training dataset (trainDataset.xls) is provided and available on Moodle that contains 400 patients. A test dataset that contains N patients is reserved (hidden from you) for the final performance evaluation. You can assume that the test set and training set are sampled from the same data distribution, but the ratio of PCR positive and negative could be different.

5. Implementation Requirement

You  are  asked  to  build  a  machine-learning  model  for  each  of  the  PCR (classification) and RFS (regression) predictions. You need to consider and implement methods for data pre-processing (e.g. how to handle missing data, outlier,  normalisation,  etc,   if  needed),  data   imputation,  feature  selection, machine learning modelling, hyperparameter tuning (if applicable) and method evaluation. There is no restriction or requirement for the selection of methods. However, you will likely need to compare several methods to pick the best one with the best parameter setting. When you perform. feature selection, ER, HER2 and Gene are very important features that must be retained and used in the modelling process.

Your code will be finally tested on a reserved test set after your code is submitted. An example test file is provided (testDatasetExample.xls) that only contains 3 examples. It is your responsibility to ensure your code can run on a test file in a similar format but contains more patients. You must name your final test code “FinalTestPCR.py” or “FinalTestPCR.ipynb” for PCR prediction, and “FinalTestRFS.py” or FinalTestRFS.ipynb” for RFS prediction so that they can be tested on the test dataset. The code for method development needs to be in a separate file, not in the “FinalTestXXX” file.

The test set will be released on 11th  December 2025 at 9 am and you need to run your code to produce the predictions for the test set and submit on Moodle  by  12th    December  2025  at  3  pm  together  with  other  deliverables (section 7). One spreadsheet for PCR and one for RFS must be generated to store the prediction outcome. The output files must be a spreadsheet (.csv) that contains the predicted outcome for each tested patient (i.e. the first column is the patient  ID, and the second column is either the  predicted PCR or RFS outcome).   Name    the   files:    PCRPrediction.csv   and    RFSPrediction.csv. Balanced classification accuracy will be used to evaluate PCR prediction. Mean Absolute Error will be used to evaluate RFS estimation.

All  implementations  need  to  use   Python  programming  language.  Any machine  learning  libraries  are  allowed  (e.g.  Scikit-learn,  Scipy,  Pandas, Tensorflow, Pytorch, etc.). Grid search for automatic hyperparameter tuning is allowed. However, any autoML based package or Large Language Models

(e.g.   ChaptGPT   or   other   methods   that   accept   the   raw   data   and automatically select the best ML method and optimise the parameter for you) are NOT allowed.

6. Assessment

Assignment 2 weighs 80% of the coursework mark (i.e. 24% of the whole course  mark).  The  marking  will  be  performed  based  on  the  objective performance on the test set, the quality of code and the quality of technical writing.   The  marking  criteria  are provided in section 8. A single mark and feedback will be given to each group. The final mark for individual students will be calculated based on the contribution table described in section 7.

7. Deliverables

For the completion of Assignment 2, the following have to be submitted on Moodle. One report (.pdf) and one zipped code file need to be submitted per group.

1.  The Python code for implementing the two tasks (PCR and RFS prediction). Besides the code for method development, two files “FinalTestPCR” and “FinalTestRFS” must be included for testing the test set. The two .csv files for PCR and RFS predictions of the test set should also be included in the code folder (note: the test set will be released on 11th  December 9 am on Moodle).

2.   A report in the format of an IEEE conference paper. Technical paper writing will be introduced in one of the lectures. A template of the required format will be provided in Word and Latex. Based on the given format, a maximum of 4 pages is allowed, excluding references (references can be on the 5th page).

3.  At  the  end  of  the   paper  (excluded  from  the  4  pages),  the  following contribution table needs to be completed and agreed upon by all members, which will be used to calculate individual student’s final marks.

Task   and

Weighting

Data   pre-

processin g (10%)

Feature

Selection (25%)

ML      method development (25%)

Method

Evaluation (10%)

Report

Writing

(30%)

Name     of

member 1

30%

15%

20%

20%

20%

Name     of

member 2

0%

25%

30%

0%

20%

Name     of

member 3

30%

20%

20%

10%

20%

Name     of

member 4

0%

10%

30%

30%

20%

Name     of

member 5

40%

30%

0%

40%

20%

The percentage of contribution in the above table is an example, which will be different for each group depending on the true contribution of each member. However, the task names and their weighting highlighted in red in the table should NOT be changed, and the sum of the contributions from all members for each task (i.e. each column) should be 100%. Note that each student can contribute to multiple tasks and each task can involve multiple students.

Besides the report and code required, you also need to submit a recorded video presentation to present your work as a group. The content of the presentation should cover background, a literature review on existing solutions, proposed  method,  evaluation   results  and  conclusions  &  discussion.  The presentation  should  be  less  than  10  minutes  and  involve  all  group members (preparing the slides, presenting, or both). Save the video in .mp4 format and submit it on Moodle (file size should be less than 250MB).

8. Marking Criteria

Elements

%

mark

Performance on test set (objective)

25%

Code quality (e.g. comments, easy to read, robustness, etc)

10%

Description of Method

25%

Explanation and presentation of the results obtained

10%

Discussion of the strengths and weaknesses of the chosen method

10%

Scientific writing and clarity

10%

Presentation

10%

Plagiarism  check  will  apply,  meaning  that  high  similarities  across different groups are not expected. Late submissions in each assignment will result in a 5% penalty per day (days rounded up to the next integer).

9. Common Q&As

-    What is the performance of each task we are expecting to achieve?

It is a real-world dataset for a challenging clinical task, hence I don’t have an estimation of performance. However, a >90% classification accuracy is too good to be true for this task.  For the RFS estimation is even more challenging. The performances are expected to vary across groups. You need to consider practical issues, including missing data in both training and testing sets, data imbalance issues, etc. You have the freedom to use any machine learning methods that are not restricted to the methods introduced in the lectures.

-    Why don’t we use an anonymised peer-assessment form to score the contribution of each member?

Anonymised   peer-assessment    form.   was    used    in   previous    years. Occasionally, members can not settle on an agreed distribution and it may involve  several  rounds  of  interviews  to  decide  the  final  percentage  of contribution. Hence, it is changed to a more transparent and quantitative contribution table.

You should split the tasks and agree on the percentage of contributions before starting the assignment, then add/reduce the percentage depending on the final delivery and quality of completion by each member. Therefore, no surprises when you see your individual mark. Remember that each group is a team rather than individual competitors. An ideal case for a group of 5 students is that each member contributes to ~20%, but I don’t expect it to happen for all groups. Please split the tasks depending on your group experience learned from Assignment 1. The highest mark a member can get is the group mark, which is based on the quality of the work. Hence marking  down  the  contributions  of  other  members  won’t  get  the  top performer a higher mark. So help each other rather than kill each other.




热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图