Assessment 2: Project Report
This assignment contributes to 70% of the overall module mark for COMP3003 and is an individual assignment.
Task (1): Literature Review (25% of the total mark of assessment 2)
You've previously studied supervised and unsupervised learning. For this assignment, delve into self-supervised learning. Examine how it differs from supervised and unsupervised methods. Detail the advantages of self-supervised learning and provide an in-depth exploration of its real-world applications, showcasing how it's been leveraged across various industries and scenarios.
Task (2) Reinforcement Learning (35% of the total mark of assessment 2)
Consider the following model in Figure (2) with 5 states and Right and Left actions, and γ = 0.9.
Figure (2): Markov Decision Process (chosen action happens with probability 1)
The figure shows that the model, at S1, can only go to S2, and at S5, can only go to S4. The reward for taking action from S4 to S5 is r = 10, while being r = 1 otherwise.
1. Describe the optimal policy for the MDP.
2. Describe V∗(S3), V∗(S4), and V∗(S5)? (in terms of γ and not state values).
3. Consider executing Q-learning on this MDP. Assume that the Q values for all (state, action) pairs are initialized to 0, that α = 0.5, and that Q-learning uses a greedy exploration policy, meaning that it always chooses the action with the maximum Q value. The algorithm breaks ties by choosing Left. What are the first 15 (state, action) pairs if our robot learns using Q-learning and starts in state S3 (e.g., (S3,Left), (S2,Right), (S3,Right),...)?
In all the above points, please justify your answers in much detail with the necessary explanation and analysis.
Task (3): Classification (40% of the total mark of assessment 2)
a) Find an appropriate dataset for the task (e.g., using Google Dataset Search or any other dataset repository) in .csv format or any other similar format. Explain the content and structure of the dataset.
b) Load the data you have been provided with into your program, and prepare it through normalization whenever necessary.
c) Divide the dataset into training and test datasets. Use cross-validation, and report - with justification - the best value of k-folds for the size of the dataset (discuss this point in light of accuracy scores).
d) Create your neural network model and train your model. Evaluate the performance of the model using different classification metrics. A detailed discussion is required (use plots whenever you can).
e) Repeat the above steps (a-d), and train a naïve Bayes classifier. You should clearly justify which probabilistic distribution you choose for the classifier. What are the other probabilistic distributions that might be used with the naïve Bayes model in this task? Why (Justify your answer using the plot of data)? Compare its performance to that of the neural network model in much detail.
f) From your observation and analysis, are there any limitations or drawbacks of the neural network and the naïve Bayes classification models?
Deliverables & Assessment Criteria (All submissions should be in one ZIP file)
• Task (1) requires writing an academic essay and the style. should reflect that by including references (Harvard style). References should be peer-reviewed journal papers and conference papers. The structure of the essay (in PDF format) should be structured including sections for Introduction (10% of the task mark), literature review (20% of the task mark), applications (30% of the task mark), discussion (20% of the task mark), conclusion (10% of the task mark), and references (10% of the task mark). The essay should be no more than 2,000 words.
• For tasks (2 and 3), please justify your answers in one PDF file with mathematical explanations and details to demonstrate your level of understanding of the different topics of the tasks.
• For task (2), Part (2-1) is worthy (10% of the task mark), Part (2-2) is worthy (30% of the task mark), and Part (2-3) is worthy (60% of the task mark).
• For task (3), your answer should not be more than 2,800 words (excluding diagrams, images, tables, Matlab code/comments, and references). Any references should be appropriately cited in the report using the Harvard referencing style. The report should be organized as follows:
- Abstract (5% of the task mark): about 150 words
- Introduction (10% of the task mark): supervised learning and neural network and naïve Bayes models about 650 words
- Implementation (60% of the task mark): implementation of neural network and naïve Bayes models (including implementation steps, Matlab code, results/screenshots, the performance of classification, and explanations) – about 1000 words
- Discussions and Conclusions (10% of the task mark) (including drawbacks of neural network and naïve Bayes models) – about 1000 words
- References (5% of the task mark)
- Appendix (including all your Matlab code + “attach the source code files to your submission”) (this code is worthy of 50% of the marks of the “Implementation” and “Discussions and Conclusions” parts.
• Besides, please submit a video of 5 minutes describing briefly your contribution and showing clearly a demo of the code working (10% of the task mark).
Threshold Criteria (these are indicative only):
< 40% Little or no analysis, and answers are largely incorrect. Little understanding of the subject. Almost no evidence of investigation and research on answering the questions. The report and essay are not clear and are not well-written or structured.
40–49% Brief discussion and little analysis for the different tasks. Answers are partially correct and/or complete and missing elaborate details and explanations. There is little evidence of investigation and research on answering the questions. The content and structure of the report and essay are moderately appropriate.
50–59% Adequate discussion and analysis for the different tasks. Answers are mostly correct and complete, with an acceptable level of detail providing some explanation of how results are obtained. There is some evidence of investigation and research in answering the questions. The content and structure of the report and essay meet basic standards of quality.
60–69% Detailed discussion and analysis for the different tasks. A significant majority of answers are correct and complete with a good level of detail explaining clearly how results are obtained. There is good evidence of investigation and research on answering the questions. The report and essay are of good standards and quality.
> 70% The different tasks are very well discussed in detail supported by excellent argument. Answers are correct and complete, especially with clear and well-justified analysis and description. There is strong evidence of investigation and research on answering the questions (e.g., through deep analysis and full investigation). The report and essay are of high standards and quality (focused and concise).