ENT608 Manufacturing Informatics AY24/25
Assignment – Mini KDD Cup
Main Objectives
. Enrich and consolidate the understanding of data mining, including its basic concept, methodology and process, categorization of several prevailing tasks, performance
evaluation and visualization, etc.
. Through a real-world industrial case study, it offers a critical opportunity to learn and practice problem formulation and problem-solving where data mining is used as a technical tool.
. Understand and reflect on how data mining, as a methodology and tool, can be deployed to tackle the prevailing problems in Manufacturing Informatics.
Case Study – Industrial Background and Data
The steel industry is one of the major contributors to the European as well as the World economy. About 1/3 of the steel production in the world is done through the recycling of ferrous scrap with the use of an Electric Arc Furnace, or EAF:
Industrial Efficiency Technology Database, 2016. Section and Plan View of Electric Arc Furnace [Online]. Available at: http://ietd.iipnetwork.org/content/electric-arc-furnace
|
The process of producing recycled steel is as follows: The combined scraps are fed into the furnace through the furnace door, and an electric current is delivered to the molten scrap via the electrodes that are suspended inside the furnace. The high currents supplied to these electrodes are transferred via an arc to the metal scrap inside, heating it and generating temperatures that can exceed 3000°C.
The conditions inside the furnace are maintained and optimized via the addition of various reactive and inert gasses and solid matter. The molten steel is then tapped out of the furnace into a mobile crucible and cast into fresh steel billets.
You will be given a dataset representing the process of producing recycled steel billets from a variety of metal scrap types. The dataset contains approximately 3500 instances, and contains a number of Attributes described in the table on the next page:
Attribute
|
Unit
|
Data Type
|
Description
|
Inputs
|
|
|
|
Heat Number
|
-
|
Nominal
|
Each heat number corresponds to a batch
|
Clean Bales ½
|
Tons
|
Numeric
|
Mass of clean bales of steel, Manufacturing scraps
|
Steel Turnings
|
Tons
|
Numeric
|
Mass of Steel Turnings, machining scraps
|
Tin Can
|
Tons
|
Numeric
|
Mass of Tin Cans
|
Estructural
|
Tons
|
Numeric
|
Mass of Structural Steel Scraps
|
Fragmentized Scrap
|
Tons
|
Numeric
|
Mass of miscellaneous steel scrap
|
Merchant 1/2
|
Tons
|
Numeric
|
Mass of Scrap Steel from Merchants
|
Recovered Scrap
|
Tons
|
Numeric
|
Mass of Scrap recovered after the Melt process
|
Total Scrap Mix
|
Tons
|
Numeric
|
Total mass of the mixed scrap added to the furnace
|
Outputs
|
|
|
|
Billet Tons
|
Tons
|
Numeric
|
Mass of recycled steel produced from batch
|
EAF
|
MWh
|
Numeric
|
Energy consumed during melting of scrap
|
Parameters
|
|
|
|
Power On Time
|
Minutes
|
Numeric
|
Duration of the melting process
|
Secondary Oxygen
|
Kg
|
Numeric
|
Mass of Secondary Oxygen added
|
Main Oxygen
|
Kg
|
Numeric
|
Mass of Main Oxygen added
|
Natural Gas
|
Kg
|
Numeric
|
Mass of Natural Gas added
|
Argon
|
Kg
|
Numeric
|
Mass of Argon added
|
Carbon Injected
|
Kg
|
Numeric
|
Mass of Carbon added
|
Lime and Dolomite
|
Kg
|
Numeric
|
Mass of inert Lime and Dolomite added
|
Dolomite
|
Kg
|
Numeric
|
Mass of additional inert Dolomite added
|
This KDD exercise aims to make predictions about energy usage within an EAF steel production operation, this is a key optimization factor in such a process, and being able to reliably predict energy consumption is a valuable skill.
The ferrous scrap used as the primary raw material for steel production comes in a variety of forms, each with its own unique chemical composition. A range of different grades of steel can be produced via the combination of different types. As a consequence, in the different chemical compositions, each scrap type will also consume a different amount of energy in order to melt. Each type will therefore have a distinct effect on final grade and energy consumption. The scrap that is rich in ferrous content and has minimum impurities, gives a better yield,and consumes less energy, but costs more to obtain.
You must make decisions about how to utilize this data to reflect the process in your models; pay attention to which attributes you use, and for what purpose you are using them.
Pay attention to the quality of the data, it has been sourced from the real world and may not be as robust as the examples you have seen so far.
Guidelines
. This is a group-based project. You are required to form. a group with two team members to carry out this exercise.
. You are allowed to use Weka or any other data mining resource and packages available to help you in problem formulation, computing, evaluation and so on.
. Your team’s performance will be entirely assessed based on your group’s outcome and your report.
. As a team, you will be required to decide what the nature of the given problem is; what type of data mining problem formulation is considered suitable for tackling such a problem; what the basic data mining process could be for your team to follow; what type of algorithms should your team wish to try; in what way, you can further leverage or
maximize your performance, and so on.
. Your team should aim to cover the areas that have been included in the lectures, and tutorials, consider pre-processing, featureselection, a variety of algorithms, validation, test and performance estimation methods.
. For performance benchmark, you can use Accuracy and/or Error Rate or any metrics that you consider the best fit for the evaluation.
. The best-performing team will be declared the Champion of this Mini KDD Cup and will receive a 10% bonus mark, up to 100% full mark.
Report and Submission
In this section,instructions are given for your report structure and content expectations. It also indicates the breakdown of report remarks. Please strictly follow this structure in preparing your report. Missing any of the following sections in the final report will directly lead to a zero mark on the corresponding section.
1. Background Understanding (10%)
This is where your team writes about your understanding of the case study background, its challenge(s), difficulty as well as a potential opportunity that will directly lead to your problem formulation, idea generation, technical blueprint, etc.
2. Data Understanding (30%)
In this subsection, your team is expected to describe to what degree you have understood the data given. These could include but are not limited to, for example, the initial analysis of the data set; data characteristics (size, number of features, data type, missing data issue, etc); how your team would wish to preprocess such data; your thoughts on data dimensionality, featureselection and data reduction issue; any other insights your team has uncovered, etc.
Your team is expected to supply sufficient evidence (such as screenshots of the performance matrix, graphics drawn from Weka and so on) to support your statement, findings and/or conclusion if any.
3. Methodology (10%)
Following the generic data mining methodology and process, sufficiently explain how your team wishes to carry out the project; what concerns your team may have, what thoughts you have put forward; and so on.
4. Model Building (30%)
In this subsection, your team is expected to detail how the intended data mining model, through either a supervised or an unsupervised approach, is being explored, formulated and established. Your team needs to rationalize the choice of algorithms, parameters, performance evaluation, performance tuning, and so on.
Similarly, your team is expected to supply sufficient evidence to support your statement, findings and/or conclusion if any.
5. Results, Performance and Evaluation (10%)
Present the performance results or evaluation outcomes in a systematic and meaningful manner. Explain the rationale of the performance metrics adopted, and again, your team is expected to supply sufficient evidence to support the statement, findings and/or conclusion if any.
6. Conclusion and Reflection (10%)
Draw appropriate conclusions and reflect on the project as a whole. Identify the strengths and weaknesses and address any problems your team may have encountered. Think about if your team is given another opportunity to conduct the project again, what different approach would you like to pursue?
7. Declaration
If any team member has made a significant contribution to the project and is considered appropriate by other team members to reward such efforts, you are welcome to declare it here. Otherwise, it is assumed that all team members have made equal contributions.
Your final submission is your report only which is up to 10 A4 pages (excluding cover page, references and appendix) in Times New Roman, 11 font size, and single-line space with reasonable side margins.
Your team must upload the digital copy of your report to Learning Central before 12pm noon on 21st Nov 2024, Thursday (Acad Week 8). A submission folder will be made available under ENT608 in Learning Central and it will be automatically closed immediately after the deadline. Remember, write in your own words. You CANNOT engage any AI tools in writing including polishing the language. Penalty (10% deduction) applies for a late submission.