代写CMT122 Machine Learning for NLP 2024-2025代写Python编程

Module Code CMT122

Academic Year 2024-2025

Module Title Machine Learning for NLP

Assessment Title Coursework 1

Assessment Number 1

Date Set Thursday, October 24th, 12:00pm

Submission Date and Time Friday, November 22nd at 9:00am Return Date TBA

CMT122 Coursework 1

This assignment is worth 50% of the total marks available for this module. If coursework is submitted late (and where there are no extenuating circumstances):

1.  If the assessment is submitted no later than 24 hours after the deadline, the mark for the assessment will be capped at the minimum pass mark;

2.  If  the  assessment   is  submitted   more  than  24   hours  after  the  deadline,  a mark of 0 will be given for the assessment.

3.  You need to submit a cover sheet (that can be downloaded from LC).

Please note that by submitting your work you declare that you have

●  Read and understood the academic regulations.

●  That  you  are  aware  of  the   consequences  of  applying  for  extenuating circumstances.

●  That you are aware of the consequences of applying for a deferral.

●  That  your  submission  (or  your  contribution  to  it  in  the  case  of  a  group submission) is in accordance with the academic integrity policy which covers a range  of  topics   including  cheating,  collusion,  plagiarism  and  the  use  of generative AI.

You can find the academic regulations here:

● https://www.cardiff.ac.uk/public-information/policies-and-procedures/academic-regulations

The  academic  regulations  for  COMSC  (which  notes  which  rules  apply  to  degree programmes in COMSC) can be found under ‘Assessment & Feedback’ in the COMSC- ORG-SCHOOL organisation on Learning Central.

If you wish to apply for extenuating circumstances, please see

● https://intranet.cardiff.ac.uk/students/study/exams-and-assessment/extenuating-circumstances

● https://intranet.cardiff.ac.uk/students/study/exams-and-assessment/extenuating-circumstances/extenuating-circumstances-policy-for- undergraduate-and-postgraduate-taught-students/new-system-pilot

If you wish to apply for a deferral, please see

● https://intranet.cardiff.ac.uk/students/study/exams-and-assessment/extenuating-circumstances

● https://intranet.cardiff.ac.uk/students/living-here/international-students/visas-and-immigration/making-a-change-to-your-studies/deferring-assessments

You can find the academic integrity policy here

● https://intranet.cardiff.ac.uk/students/study/exams-and-assessment/academic-integrity

Submission Instructions

This coursework consists of a portfolio divided into two parts with equal weight:

- Part 1 consists of a selected homework similar to the one handed in throughout the course. The final deliverable consists of a single PDF file, which may include the methodology, snippets of a Python code and solved exercises.

- Part 2 consists of a machine learning project where the students implement a basic machine learning algorithm for solving a given task. The deliverable is a zip file with the code, a readme, and a written summary (up to 1 ,200 words) describing the solutions, design choices and a reflection on the main challenges faced during development. - Any code submitted will be run in Python 3 (Linux) and must be submitted         as         stipulated         in         the         instructions         above. Note that we are using the Harvard referencing style. for citing any related work.You can               refer               to               the               following               guidelines https://xerte.cardiff.ac.uk/play_4191?utm_source=refexamples&utm_medium=ref erral#page1

- Any deviation from the submission instructions above (including the number and types of files submitted) will result in a mark of zero for the assessment or the question part.

-  Staff  reserve  the  right  to  invite  students  to  a  meeting  to  discuss  the  coursework submissions

Assignment

In this coursework, students demonstrate their familiarity with the topics covered in the module via two separate parts with equal weights (first part: 50%; second part: 50%).

Part 1 (50%)

In Part 1, students are expected to answer two practical questions.

Practice

1. Your algorithm gets the following results in a classification experiment. Please compute the precision, recall, F-measure and accuracy *manually* (without the help of your computer/Python, please provide all the steps and formulas). Include all the computation steps included in the process to get to the final result. (20 points)

ID

Prediction

Gold

1

positive

positive

2

positive

negative

3

positive

negative

4

negative

negative

5

negative

neutral

6

neutral

positive

7

neutral

neutral

8

neutral

positive

9

positive

negative

10

negative

negative

11

negative

positive

12

positive

positive

13

positive

positive

14

neutral

positive

15

positive

negative

16

negative

negative

17

negative

positive

18

positive

positive

19

positive

negative

20

negative

negative

2.  You are given a dataset (named “IMDB reviews” ) with movie reviews and their

associated sentiments (dataset available in Learning Central). Your goal is to train machine learning models in the training set to predict the sentiment of a review in the test set. The problem should be framed as both a regression and a classification problem. The task is therefore to train two machine learning models (a regression and a classification model) and check their performance.

You can choose the Python SkLearn models to solve this problem.

Write, for each of the models, the main Python instructions to train and predict the labels (one line each, no need to include any data preprocessing instructions in the pdf) and the performance in the test set in terms of Root Mean Squared Error (regression) and accuracy (classification).

While you will need to write the full code to get the results, only these instructions are required in the pdf. (30 points)

Part 2 (50%)

In Part 2, you are provided with a text classification dataset (named “bbc_news”). The dataset contains news articles assigned to five categories: tech, business, sport, politics and entertainment. Using this dataset, you need to preprocess the data, select features, train and evaluate a machine learning model of your choice to classify the news articles.

You should include at least three different features to train your model, one of them should be based on some sort of word frequency.

You can decide on the type of frequency (absolute or relative, normalised or not). Text preprocessing is mandatory for the word frequency feature.

The remaining two (or more) features can be chosen freely. Then, you will have to perform. feature selection to reduce the dimensionality of all the features.

Note

Training, development and test sets are not provided. It is up to you to decide on the evaluation protocol and partitions (e.g., cross-validation or predefining a training, development and test sets). This choice should be explained in the report.

Deliverables

For this part, the deliverables are a Python code including all the steps and a report of up to 1,200 words.

The Python code should include the Python script. and a small README file with instructions on how to run the code in Linux. Jupyter notebooks with clear execution paths are also accepted.

The code should take the dataset as input and output the results according to the chosen evaluation protocol.

20% of the marks are for the code (10 points) and 80% are for the report (40 points). The code should include all the necessary steps described above: to get the full mark for the code, it should work properly and should clearly perform all the required steps. The report should include:

1.  A description of all the steps involved in the process (preprocessing, choice of

features,feature selection, training and testing of the model). This description should be such that one can understand all the steps without looking at the code.

(10 points - The quality of the preprocessing, features and algorithm will not be considered here.)

2.  A justification of all the steps. Some justifications may be numerical, in that case, a development set can be included to perform. additional experiments.

(10 points - A reasonable justification is enough to get half of the marks here. The usage of the development set is required to get full marks.)

3. A report on the overall performance (accuracy, macro-averaged precision,   macro-averaged recall and macro-averaged F1) of the trained model on the dataset.

(10 points - Indicating the results, even if very low, is enough to get half of the marks here. A minimum of 65% accuracy is required to get the full mark.)

4. A critical reflection on how the deliverable could be improved in the future and on the potential biases that the deployed machine learning models may have.

(10 points - The depth and correctness of the insights related to your deliverable will be assessed.)

The report may include tables and/or figures.

Extra credit (optional) - 10% extra marks in the second part (5 points): For the

second part students can get extra credits by writing an essay on one specific task

related to Part 2 (except for option d, see instructions below). The essay will need to

contain a maximum of 500 words (figures/tables are allowed and encouraged) and will

deal with one of the following four specific topics:

Error analysis Check the types of errors made by the system submitted for Part

2 and reflect on possible solutions to the observed issues. Conducting a qualitative analysis on specific examples is encouraged.

Literature review Write an essay about the state of the art of the field (i.e., text classification/categorization). Retrieve relevant articles and digest them, connecting them to your proposed solution to the problem in Part 2.

Model comparison Propose and evaluate machine learning systems of a

different nature from the ones taught during the course. Write a table with all the results and analyse the strengths and limitations of the proposed approaches.

Code release Create a GitHub or Bitbucket repository with the data and Python code used for Part 2. Add clear instructions on how to run the code from the terminal and about its different functionalities/parameters. Include all the necessary data, provide full documentation and comment on the code. Students only need to include the link to the repository in the pdf.

Note The maximum marks for the second part will be 50 in any case.

Criteria for assessment

Credit will be awarded based on the following criteria.

Part 1

The main criterion for the assessment is the correctness of the answers, for which an explanation of the methodology is also required. Full marks will be given to answers that include the correct answer and justification or methodology.

Part 2. This part is divided into Python code (25%) and an essay (75%). The code will be evaluated based on whether it works or not, and whether it minimally contains the necessary steps required for the completion of Part 2. Four items will be evaluated in the essay, whose weights and descriptions are indicated in the assessment instructions.

The main criteria to evaluate these items will be the adequacy of the answer with respect to what was asked, and the justification provided.

Assessment Criteria

High

Distinction 80%+

Full understanding of all the concepts, correct answers and methodology,

well-documented and working code, excellent justification and description of all steps and critical analysis.

Distinction 70-79%

Full understanding of all the concepts, correct answers and methodology, well-documented and working code, accurate justification and description of all steps and critical analysis.

Merit

60-69%

Good understanding of all the concepts, working code, justification and description of steps and analysis.

Pass

50-59%

Few errors in questions and code, methodology with issues and no detailed description of steps and justification or with issues

Marginal Fail 40-49%

Code with errors, flawed methodology, incorrect solutions, and no clear description of justification of steps.

Fail

0-39%

Considerable flaws in the methodology and errors in the code, incorrect solutions, and no clear description of justification of steps.

Feedback and suggestions for future learning

Feedback on your coursework will address the above criteria. There will be an opportunity for individual feedback during an agreed time.

Feedback for this assignment will be useful for subsequent skill development, such as machine learning in general, data science, natural language processing and deep learning (which will be studied during the second part of the module).


热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图