Module code and Title
|
DTS208TC Data Analytics and Visualisation
|
School Title
|
School of AI and Advanced Computing
|
Assignment Title
|
Coursework 1
|
Submission Deadline
|
27/Mar/2025
|
Final Word Count
|
N/A
|
T1 Data Preprocessing (20 marks)
T2. Exploratory Data Analysis (EDA) (25 marks)
T2-1: Load the CSV file; show the dimensionality, structure and summary of the dataset.
T2-2: Calculate the number of students whose attendance is lower than 80.
T2-3: Visualize the distribution of previous_scores.
T2-4: Calculate and visualize the number of students with different family incomes.
Codes
|
|
Result
|
|
Visualization
|
|
T2-5: Calculate and visualize the average Exam_Score of students corresponding to different Sleep_Hours.
Codes
|
|
Result
|
|
Visualization
|
|
T2-6: Analyse data visualization results of T2-5 and summarize your findings in the report.
T3. Modelling (35 marks)
T3-1: Create a new column named ‘level’ with values 0, 1, and 2
T3-2: Choose 5 factors (with nomalization) and apply 1 data analytics method (e.g., kNN, logistic regression, decision tree, random forest, SVM, etc.) to predict the level value.
The method you choose
|
|
The factors you choose
|
|
Code
|
|
Result
|
|
T3-3: Use k-fold cross validation with k = 5 folds to evaluate performance.
T3-4: Select features (factors) and/or tune model parameters to achieve the optimal performance. Show (or plot) model performance under different feature selection and/or parameter tuning settings.
T3-5: Report the best prediction results (i.e., Accuracy, Precision, Recall, F1-score) and the corresponding running time.
T4. Evaluation and Discussion – (20 marks)
T4-1: Use one example from the given dataset and draw plots or figures to explain how the input is processed by you model to generate prediction results.
Example
|
|
Figure
|
|
Explanation
|
|
T4-2: Discuss the advantages and disadvantages of the model you choose and point out some future directions to further improve model performance.
Advantages
|
|
Disadvantages
|
|
Future Directions
|
|