MLE5247 AY24/25 SEM 2
Assignment 2
You have two options to choose from. Please pick the one that interests you the most. Each earns a maximum of 35 marks.
Option 1: Building and Implementing a Python Code
Task:
Use a neural network - such as a shallow MLP, a GNN, or even a GAN - to predict the band gap of a compound based on descriptors like composition or physical properties (e.g., density, formation energy).
Steps to Consider:
• You may use the materials dataset provided with this assignment.
• Feel free to modify the dataset if needed (e.g., by adding features, including additional data, or selecting a subset).
• You can use AI tools (e.g., ChatGPT, DeepSeek, Gemini, Co-pilot) to help generate, debug, or improve your code. However, please acknowledge their use, and remember that these tools can make mistakes or produce incomplete code. You’re responsible for verifying that each step aligns with your objectives.
• Train/test your network on the band gap regression problem.
• Visualize the distribution of predicted vs. actual band gaps, and discuss any limitations.
Important Note:
Our main goal here is not to produce perfect code or a fool proof band gap predictor. Your strength lies in your materials science expertise, so focus on applying your domain knowledge to design a model that’s appropriate for this task. Treat AI tools as a technical assistant that may not fully understand what you are trying to achieve.
Comments in Your Code:
• Include thought-process comments explaining why you chose certain parameters or models, any optimization considerations and other decisions, also shortcomings or how you might improve the approach further.
Evaluation:
• Your reasoning and logic will be the primary basis for evaluation, so ensure your ideas and insights are clearly expressed in your comments and discussion.
o Logical reasoning (~ 20 marks)
o Accuracy of code & implementation (~10 marks)
o Output (~5 marks)
Option 2: AlphaFold
Background:
AlphaFold - particularly AlphaFold2 from DeepMind - is a ground-breaking deep learning system that predicts a protein’s 3D structure from its amino acid sequence. It gained major recognition in 2020–2021 by outperforming all previous methods in the CASP (Critical Assessment of protein Structure Prediction) competition and was honoured with the 2024 Nobel Prize in Chemistry.
Task:
Conduct a literature survey to explore how AlphaFold actually works - both the ML- based and non-ML techniques that helped crack the longstanding protein folding challenge. Then, compile a report that:
• Explains the problem of protein folding and the core challenges
• Outlines how AlphaFold overcame these challenges
• Breaks down the computation methods / process
• Uses pictorials, graphs, or other visuals to clarify the mechanism
You may use AI tools to aid your research, but do not copy text directly from them and acknowledge their use. Remember, AI outputs can sometimes be inaccurate, so please double-check any information you include.
Evaluation:
• There’s no fixed page limit, but as a rough guideline, 3 - 4 pages should be sufficient if you effectively capture and condense the model’s complexity (which will be challenging!). However, evaluation will focus on the depth of your insight into the inner workings of the model(s) and your ability to explain or illustrate them in a meaningful, clear way.
o Background (~5 marks)
o Motivations for and behind computational approaches (~5 – 10 marks)
o Insight into AlphaFold (~25 – 20 marks)