COMP 4211: Machine Learning
Spring 2025
Group Project
Kaggle Link: https://www.kaggle.com/competitions/comp-4211-spring-25-project/overview
Join Link: https://www.kaggle.com/t/06bfd12644594a0f988a8f5568950696
Competition due: 2 May 2025, Friday, 11:59pm
Report, Code due: 2 May 2025, Friday, 11:59pm
Guidelines
• Each project group should have exactly two students. You may form your own project group. We may also form a group for you if you need help.
• The project will be counted towards 15% of the final course grade, of which the competition part accounts for 70%, report and code account for 30%.
• As always, please use Piazza to ask questions.
• No generative artificial intelligence (GenAI) tools for code and text generation are allowed.
— Prior approval from the course instructor is needed if you want to use any other GenAI tool for your project.
— If approved, it should be stated clearly in the project report with detailed explanation on how the GenAI tool is used.
1 Preamble
The objective of this project is to practice the hands-on skills needed for solving more realistic machine learning tasks through pursuing a proposed study that involves one or more machine learning topics studied in this course. In this semester, for the first time, we adopt a Kaggle- competition format to make it more realistic – Kaggle is a very common and popular platform to benchmark problem solving and in reality, the product we develop often needs to compete those from others on some standard metrics.
The project is expected to be more substantial and hence is a group project. More information about group formation will be announced in Piazza later if you need help.
Note that this project should not be used for earning credits in a different course to avoid double-dipping.
2 Project Ideas and Considerations
In this project, you will work with an entity tagging problem. The dataset contains 40,000 sentences (examples) alongside a list of entity annotations for each word (i.e., organization, person, location). Your task is to develop a model, either machine learning approaches or deep learning based approaches to predict the entity type of each word in the test set. You can decide how to use the 40,000 training set (for example, how many for validation) and your model will be evaluated on 5,000 test examples on Kaggle platform where 2,500 examples for public leaderboard and another 2,500 examples for private leaderboard. As in reality, you will only see the results on the public leaderboard, but we will grade in terms of the results on the private test set.
This project is hosted on Kaggle, where you can submit your predictions and receive immediate feedback of the public leaderboard. Every group is allowed to submit 10 times per day. Your grade will partly depend on your rank on the Kaggle private leaderboard. Additionally, you must submit your code, your trained model checkpoint, and a concise report on Canvas.
For more details about the data, requirements, rules, please refer to the Kaggle page.
3 Project Report and Source Code
The report should cover at least the following aspects of the project:
• Project title
• Students with full names, student IDs, and HKUST email addresses
• Description of the data processing methods, if any
• Machine learning / deep learning method(s) used for solving the task
• Experiments and results
• Any insights you get which help you adjust the methods/hyper-parameters. For example, what you have tried, and what were the results of those trials, what conclusions did you draw.
• Division of labor in teamwork
The project report should be self-contained in the sense that readers can understand what you do just by reading the report without reading your code. An important general criterion is clarity, to the extent that others can replicate your experiments based on the information provided in the report.
You are recommended to use a 12-point font with single spacing in your report.
You should state clearly the division of labor between the two group members by listing the main duties and contributions of each member. In addition, the overall contribution of each member to the project should also be given in percentage (e.g., 55% by A and 45% by B). You should try your best to ensure that the workload is shared evenly (i.e., 50% each). Grading will be done individually according to the workload distribution.
All the source code that you have written and used for this project should be submitted for grading. In case your code is modified from another source, you should acknowledge it clearly in your report and point out which parts are yours. Failure to do so is considered plagiarism.
You should name the report as report.pdf and the compressed source code and model check- point as code .zip.
4 Computing Facilities
Kaggle gives users free access to GPU but with a limitation (e.g. 30 hours/week) which should be enough for training the small models in this project. However, in case it cannot meet your need, you can use Colab/Colab Pro or other cloud computing services.
5 Model Restriction
To ensure fairness and accessibility for all students, trained models must have fewer than 1 billion parameters (1B). This restriction is designed to: (1) Prevent resource disparity from dis- proportionately impacting grades. Not all students have access to rich GPUs and (2) Encourage exploring different strategies rather than simply scaling model size.
Inside your code file, you need hand in your trained model as well, the detailed format can be checked on Kaggle page. We will run script to automatically check duplication of model checkpoints (two model checkpoints will not be the same normally), check your model size and check wether the prediction by this model is consistent with your score on leaderboard.
6 Submission
6.1 Kaggle Submission
This project is hosted on Kaggle. You will need to register for a Kaggle account using your HKUST Connect email. After registration, join the competition via the provided invitation link. Your username and team name can be randomly selected; we will track scores based on your login email. Please note that we only granted a private email list to access to this competition, only the emails that correspond to the students in COMP4211 can join this competition.
Once registered, you can submit your results on Kaggle and view your rank on the leaderboard. Your final grade will be partially based on your ranking (see grading for details).
6.2 Other Components Submission
All other assessment components of the project except Kaggle submission must be submitted electronically in the Canvas course site, with deadlines as listed on the first page.
For Code submission, you are required to submit your code on Canvas for review. Important: We will check for code similarity and reproducibility. Please refer to the Code Requirements on Kaggle for specific guidelines on ensuring reproducibility.
Only one member of each project group will submit report, code and model checkpoints on behalf of the group, but the names of both members should be listed clearly in all the assessment components.
When multiple versions with the same filename are submitted, only the latest version according to the timestamp will be used for grading. Files not adhering to the naming convention above will be ignored.
Late submission will be accepted but with penalty. The late penalty is deduction of one point (out of a maximum of 100 points) for every hour late after 11:59pm with no more than two days (48 hours). Being late for a fraction of an hour is considered a full hour. For example, two points will be deducted if the submission time is 01:23:34. However, since submissions for Kaggle competitions are not accepted after the deadline, the late policy does not apply for this part, and any late submission will result in a score of 0 for the Kaggle part.
7 Grading
The grading will based on two parts: kaggle competition (70 pts), report and code (30 pts).
1. For kaggle competition: We use two leaderboards on Kaggle to evaluate your models’ per- formance and generalization ability: one is a public leaderboard visible to you, and the other is a hidden leaderboard that we will use to score the final performance. Both leader- boards are evaluated with the weighted F1 score metric (for each entity class). You can submit multiple times for testing your model on public benchmark, however, only one final prediction can be selected for private leaderboard evaluation. In case you are concerned about hidden test performance, we will show it to you twice, currently planned for 11:59 pm on April 11 and April 29.
To provide a baseline, we have built a simplified data preprocessing and model training pipeline. We trained models ourselves, achieving a weighted F1 score of 0.68 on pri-vate leaderboard (and 0.675 on public leaderboard). Thus we set a baseline performance threshold r0 = 0.58:
(a) Meeting the baseline (r ≥ r0) will award you a minimum score of 60% (42 points out of 70).
(b) Scoring Metric: Your final score in the kaggle section is determined by both your hidden test F1 and your ranking on the hidden leaderboard. Here’s how it works:
i. Top 10 Rankings: If your model ranks between 1st and 10th place on the hidden leaderboard, you will receive a full score of 70 points. Additionally, bonus points will be awarded to the top three ranks: 15 points for the 1st place, 10 for the 2nd place, and 5 for the 3rd place. The bonus points are added on top of the original 70 points.
ii. For Ranks Below 10: We will use the F1 score of the 10th-ranked model, r1, as a secondary reference point. For students ranked below 10, if your F1 score is r , your score will be calculated as follows:
This formula adjusts scores based on the degree to which your model’s weighted F1 improves upon the baseline relative to the 10th-ranked model’s performance.
Note that the public test F1 is visible for reference only and will not affect grading.
Remember that a higher F1 on the public leaderboard does not necessarily mean a better score on the hidden leaderboard.
(c) For report: A simple report should not only discuss the changes made in the model hyper-parameters and the corresponding result, but also analyze the results and why you do such changes. The report grading will be based on the amount of work as reflected in your report, coherence of the results, insightfulness of discussions. For example, you can optionally include some explorations that you have tried and the results, rather than only the final solution, particularly when the performance of your method is not very good.