CHME0034: HDS Applied Computational Genomics - coursework 2025
Instructions
As a Computational Genetics researcher, you have been asked to analyse a new case control dataset to characterise the common genetic determinants of schizophrenia. You will need to run basic quality control of the data and then run a genome wide association study (GWAS), accounting for covariates, and finally examine and discuss your findings using relevant online tools and resources. In your report, you should provide a commentary explaining why you have undertaken each analytical step. You should also present your results in detail with the aid of tables and figures where appropriate, in the format of a formal scientific report. The report should be between 2000-2500 words and must include code snippits for the key commands you used.
Data
All schizophrenia cases and the control sample have been recruited in the UK. In addition to raw non-quality controlled genetics data in plink format you have a covariate phenotype file available with participants’ ages at recruitment. The data and a few helper scripts for this assignment can be found on Moodle along with this description. Make sure to copy over the files to aristotle (aristotle.rc.ucl.ac.uk) before you start working.
Quality control and data description (40 marks)
Give a short introduction to the analysis. Describe details of the sample you have received and perform. basic quality control of the sample as described in our lectures. Make sure you report the result obtained at each step and discuss and describe the rationale behind it.
Genome wide association study (30 marks)
Perform. a genome-wide association study of schizophrenia. Visualise the results using Manhattan and QQ-plots. Repeat the analysis with adjustment for covariates. Report the findings of the analyses and provide a discussion of any differences observed.
Interpretation and discussion of findings (30 marks)
Use the genome-wide association study (GWAS) summary statistics you have generated to identify candidate variants and/or genes by applying at least two online tools introduced during the course. Justify your choice of these tools and provide a step-by-step outline of how you implemented them.
Present an analysis of your findings and discuss their potential biological relevance to schizophrenia. Additionally, explain the challenges associated with identifying causal variants and genes from GWAS data, both in general and in relation to your specific findings.