Final Exam of MTH416
Neural Networks and Deep Learning
MSc Data Science Program
2024/2025 S2
Learning Outcomes of MTH416
• A. Demonstrate understanding of the basic concepts in neural networks, and construct simple neural networks to solve problems.
• B. Use the backpropagation algorithm, and prove basic equations in the BP algorithm.
• C. Perform. the learning procedures of neural networks, including weight initialization, regularization, and activation functions.
• D. Show why neural networks can approximate any function.
• E. Demonstrate understanding of vanishing gradient phenomena, and solve this issue in the training of neural networks.
• F. Construct convolutional networks, and employ CNN to solve problems.
The Final Coursework Project
Objectives: The aim of this final coursework project is to evaluate students’ ability to deploy deep neural networks to solve practical problems, with a particular focus on image classification in computer vision.
Dataset For This Project: Please download the dataset from the box link below:
https://box.xjtlu.edu.cn/d/fc9d7a3c5489443db056/
In the folder named MTH416_Final_Project_Coursework_Dataset, you will find three sub-folders: train, val, and test. They correspond to the training, validation, and test datasets, respectively. The datasets are labelled and grouped into normal, benign, and cancer categories.
Problems:
In this project, you will build deep neural networks for cancer diagnostics using collected clinical image data. Throughout the project, you will classify image data, evaluate your models with appropriate metrics, and adjust your model based on the practical issues observed from the dataset.
Q1 [30%]: Implement a deep neural network of your choice in PyTorch for the classification of the clinical disease: normal, benign, and cancer. Techniques you might need to consider while implementing your model
• Pre-processing of the dataset (centering, normalization, etc)
• Data augmentation
• Weight initialization
• Activation functions
• Dropout
• Batch normalization
• Learning algorithms (e.g., SGD with momentum, Adam) and learning rate tuning
• Loss functions (e.g., hinge loss vs cross entropy)
• Hyper-parameter tuning
Note that transfer learning from pretrained models is not allowed for Q1. Please report the model architecture, configurations, the number of learning parameters in your model, and the performance of your model on the training, validation and test datasets in terms of classification accuracy in the final report. Finally, discuss what you found and learned from solving this problem in the report as well.
Q2 [30%]: Implement a transfer learning model based on a pretrained model (e.g., pretrained ResNet-18) and fine-tune it in PyTorch for the classification of the clinical diseases: normal, benign, and cancer.
Please report the model architecture, configurations, the number of fine-tuning parameters in your transfer learning model, and the performance of the model on the training, validation and test datasets in terms of classification accuracy in the final report. Compare the performance of the transfer learning approach with the model you built in Q1. Finally, discuss what you found and learned from solving this problem in the report as well.
Q3 [20%]: The classification accuracy metrics used in Q1 and Q2 can lead to misleading conclusions about model performance when the class labels are highly imbalanced, i.e., samples of patients without cancer far exceed those with cancer in the dataset. A confusion matrix and metrics like precision and recall can provide a better understanding of model performance.
In the final report, please include the confusion matrix for Q1 and Q2. Discuss your observations from the confusion matrix and plot the precision and recall curve. Hint:
you can find more information on precision and recall here:
https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html
Q4 [20%]: The class imbalance issue in the clinical dataset can lead to problematic performance of the deep learning model. A few possible ways to adjust model performance include re-balancing the classes in each category, such as:
1) Reweight the loss function in the model so that each class contributes equally to the loss function;
2) Use up-sampling or down-sampling to balance the samples in each class.
Please implement one of the ideas above into your model in Q1 or Q2 in PyTorch, choose appropriate evaluation metrics, and report your findings in the final report.