Assignment 3a: Individual code
Assignment Overview
Assignment 3 consists of two main deliverables: runnable code and the report.
1) Part 3a: Individual Code
weight: 5% (of course total)
due: end of (week 11)
mode: individual
2) Part 3b: Group Report
weight: 25% (of course total)
due: end of (week 12)
mode: group
To successfully accomplish this task, you need to demonstrate good coding and analytical skills as well as professional communication and writing skills.
You will work in groups of three. Equal contribution and engagement of each group member is expected.
Business Scenario
Your work on this task is based on the following scenario:
You are working in a team of developers for a grocery store. The store manager noticed that some items are often bought together. The manager wants to find out exactly what items customers buy most often together in one basket (we call them itemsets). This information will be used to place itemsets close together, so that customers can find them quickly, which in turn may increase sales.
After analysing the problem, your team has discovered that once frequent itemsets are identified, it is also possible to recommend products from these itemsets to customers on the store website.
Your team, being knowledgeable of both frequent itemsets mining and recommendation systems, wants to go even further: you want to test other well-known recommendation methods, such as collaborative filtering, to see which recommendation method works better.
The store collects details about customers’ buying habits through a loyalty programme and your team is given access to the representative dataset. The system you build, however, should scale to around one million customer transactions.
The project has been approved by the store management, so you are ready to start building the system which can help to significantly increase sales.
Weighting & Due Dates
This assessment is worth 5% of your overall grade.
The submissions are due Sunday Night, 23:59 (end of week 11).
Note:
Even though you get an individual mark for this assignment, the work you do will affect Assignment 3b group work. Do high-quality work to achieve the best results as a group!
Course Learning Outcomes
CLO 2: Apply suitable algorithms for particular data mining problems.
CLO 3: Design and develop processes and products to solve business problems related to data mining.
CLO 4: Resolve data mining problems in collaboration with others.
CLO 5: Communicate effectively in a variety of forms using appropriate terminology.
Task Description
Purpose:
To practice of using association rule mining and recommender system methods and to apply pattern mining and recommendation system methods to solve a practical problem.
Instructions
There are two main parts of this assignment. The first part is an individual code contribution. Despite the fact that it is an individual assignment, this assignment is part 1 of a two-part assignment. The second part is a group (https://myuni.adelaide.edu.au/courses/101178/assignments/424671) assignment.
Datasets for the assignment: training set
(https://myuni.adelaide.edu.au/courses/101178/files/16531438/download) , test set
(https://myuni.adelaide.edu.au/courses/101178/files/16531336/download) .
Part 0: Teamwork
To start with, as a group you should decide on the work distribution and task allocations. There are three main tasks as detailed below. In general, it is suggested that each member of the group 'mostly works' on one of the three tasks. However, keep in mind, this task has a strong integration element. As such, this work should be shared/distributed depending on the complexity of the tasks.
Task 1: Patterns
Write code in Python (or R, Python preferred) to mine frequent patterns from the training dataset. You can use any pattern mining algorithm discussed in the course (i.e., Apriori or FP-growth). You can also use a method not discussed in the course, but if you do, ensure you reference your source and describe why it is required and how it works.
At the end of this task, you should have a system which can take raw input data and produce a series of patterns (and some kind of good/bad quality of the patterns).
Task 2: Collaborative Filtering
Write code in Python (or R, Python preferred) to implement collaborative filtering on the training dataset. Your method should score recommendations for a user to select top recommendations. Select a metric to measure the performance. Write code to test the recommendations on the test set.
You should also integrate the code from part 1, to also allow for the making of recommendations based on the patterns found.
At the end of this task, you should have a system which can take either:
Raw input data and produce a series of recommendations, and
Pattern input and produce a series of recommendations. (i.e., output of Part 1).
Task 3: Research Methods
Read the provided references and conduct additional research to find at least one more credible academic source on:
"The use of frequent patterns to generate recommendations."
Based on the research and your own ideas, write code that integrates tasks #1 & #2 to produce results for the report.
End Game
At the end you should have something like this. Task one has contributed a system which turns data into patterns. Task two has done collaborative filtering and turned the results of both the pattern list and the collaborative filtering table into recommendations (this part is very small, pick the best matching pattern in the list - if there is one).
Task 3 takes the output of Task 1 and Task 2 (collaborative filtering table and a pattern list) and produces a kind of 'combined recommendation' which takes into account both methods of matching users together.