A Data Programming Project
Now that you have had a chance to explore some techniques and tools in Python, it is time to start working on your own exploratory data analysis project. This is a chance for you to explore a research area of your choosing. You will identify a clear agenda for research and explore this topic at a high level.
Expectations:
- Identify your own research area and questions, including importing knowledge from external sources.
- Acquiring a dataset that is fit for purpose.
- Exploring the dataset through different lenses, identifying key features and potential flaws in the data.
- Produce a systematic, rigorous and well-reasoned report on how you work through the dataset.
- Describe at both a technical and analytical level, how and why you are approaching the problem space in a particular way.
- Identify gaps in your approach, the dataset and any techniques, tools, libraries or data structures that you choose to utilise.
- Consider the ownership (provenance) of data through a data processing pipeline and how this might manifest.
- Consider how data can be prepared, refined and explored for further analysis e.g. for a final year project.
- Critically analyse, evaluate and summarise findings from a mini-research project.
- Reflect on both processes and outcomes of your project, including any missing steps or stages.
- Give a valuable account as to how your analysis provides useful and interesting insights around some dataset.
You should present your work in a single Jupyter Notebook (.ipynb file) as part of a larger (ZIP) archive of files. Any data that you use should also be included and readily accessible for checking – included in the ZIP archive. Your ZIP archive should not exceed 30MB in total, including your ipynb file and any data that you choose to utilise. The dataset should not be more than 10MB in total size.
The marking rubric includes a description of expectations and deliverables, where sections a-j are each worth a total of 5 marks.