SCHOOL OF COMPUTER SCIENCE
MASTER OF APPLIED COMPUTING (MAC)
ASSIGNMENT 3 (Weightage 30%)
SEPTEMBER 2024 SEMESTER (Block 2)
MODULE NAME
|
: Principles of AI
|
MODULE CODE
|
: ITS70304
|
Scenario and Task Description
Spam email is unsolicited and unwanted junk email sent out in bulk to an indiscriminate recipient list. Typically, spam is sent for commercial purposes. It can be sent in massive volume by botnets, networks of infected computers.
Often, spam email is sent for commercial purposes. While some people view it as unethical, many businesses still use spam. The cost per email is incredibly low, and businesses can send out mass quantities consistently. Spam email can also be a malicious attempt to gain access to your computer.
Spam email can be difficult to stop, as it can be sent from botnets. Botnets are a network of previously infected computers. As a result, the original spammer can be difficult to trace and stop.
If you receive a message that appears to be spam--for example, if you don’t recognize the sender--mark the message as spam in your email application. Don't click any links or attached files, including opt-out or unsubscribe links. Spammers sometimes include these links to confirm that your email address is legitimate, or the links may trigger malicious webpages or downloads.
Anti-spam solutions should address a wide range of known threats beyond spam, phishing and botnet attacks that include hard-to-detect short-lived and low volume email threats. S
Spam email can be dangerous. It can include malicious links that can infect your computer with malware. Do not click links in spam. Dangerous spam emails often sound urgent, so you feel the need to act.
Whether an email message is spam or a legitimate advertisement, in the United States it's subject to the guidelines in the CAN-SPAM act.
When businesses capture your email address, they often subscribe you to their newsletter by default, as a low-cost way to sell their products. Whenever you fill out an online form, look for a checkbox to opt into or out of marketing email. While these emails can be pesky, most are harmless, and by law they must have a visible opt-out or unsubscribe option.
If you unsubscribe and continue to receive spam, update your email settings to filter messages from the sender's address out of your inbox.
To identify whether an email is spam or not spam, we can use machine learning techniques for binary classification.
Practical Skills: Part I
1. Regular expressions also called regex. It is a very powerful programming tool that is used for a variety of purposes such as feature extraction from text, string replacement and other string manipulations. Explain in detail how regex works? List and explain 5 examples of regex functions. Show your code and your sample output. (7 marks)
2. Carrying out a simple Exploratory Data Analysis (EDA) from the spam_dataset.csv dataset. Predicted column has 2 values (0 and 1 which indicates spam or not spam). Explain with example how the message_content was identified as 0 or 1? (3 marks)
3. When your data is ready for modelling, you can start building your prediction model. Perform. TWO (2) modelling (please use only the modelling covered in the module). Justify the selected Machine Learning models and describe them. (5 marks)
Practical Skills: Part II
4. There are TWO (2) basic matrices to measure a performance from a Machine Learning model. Describe those TWO (2) matrices. (4 marks)
5. Evaluate the performance for both Machine Learning models you used in Part I. Show their accuracy, precision and recall values. Explain the results you have received for both models.
(8 marks)
6. Show how the best model you built can predict whether the test data can give spam or not spam email. Please use examples of message_content below as your test data. Show your code and your output. (3 marks)
Message number 1:
Bungkusan anda kini keluar untuk penghantaran
Pesanan anda 2409189402GYMP baru sahaja dikeluarkan oleh POS Malaysia dan kini dalam perjalanan kepada anda.
Message number 2:
Hi AFIZAN BIN AZMAN,
You have received RM 481.25 in your account ***3050 on 17 Sep 2024 16:50:08. Please log in to your Maybank2u and check your account for more info on this fund transfer.
Message number 3:
You have successfully checked-in
Dear Afizan Azman, Your check-in is successful. Print your confirmation pass and present it at the airport's check-in counter to get your boarding pass.
Booking Reference: 5LAUWK
To demonstrate a broad and coherent theoretical and technical knowledge comprehension,
add comments where necessary throughout the program. Please make sure you copy paste the respective code in your pdf file and explain each of them.
Marking Rubrics (lecturer’s use only)
Attach as second page in the report.
|
The purpose of this learning assignment is based on the following module learning outcome (MLO):
MLO3 - Proposed and select suitable AI or Machine learning algorithm for a given application.
MLO4 - Analyze an AI-based solution for a given application.
Type of activity: Practical
|
Question
|
Weight
|
Outstanding
(80 – 100)
|
Mastering
(65 – 79)
|
Developing
(0 – 64)
|
Part I
|
Accurately describe in detail how regex works and show clear understanding in it. Correctly list and explain 5 examples of regex functions with code and sample output. The similarity is less than 2%.
|
Almost acccurately describe in detail how regex works and show clear understanding in it. Correctly list and explain 5 examples of regex functions with code and sample output but no comprehensive explanation .The similarity is between 2% to 4%.
|
Not accurately describe in detail how regex works and show clear understanding in it. Incorrectly list and explain 5 examples of regex functions with code and sample output. The similarity is greater than or equal to 5%.
|
Q1
|
_____/7
|
Q2
|
_____/3
|
Correctly identify how the message_content was identified as 0 or 1 from the binary classification conversion. The similarity is less than 2%.
|
Adequate identify how the message_content was identified as 0 or 1 from the binary classification. The similarity is between 2% to 4%.
|
Incorrectly identify how the message_content was identified as 0 or 1 from the binary classification. The similarity is greater than or equal to 5%.
|
Q3
|
_____/5
|
Demonstrates comprehensive steps to build the two modelling with correct justification. The similarity is less than 2%.
|
Demonstrates comprehensive steps to build the two modelling adequately with adequate justification. The similarity is between 2% to 4%.
|
Did not demonstrate comprehensive steps to build the two modelling and incorrectly justify the steps. The similarity is greater than or equal to 5%.
|
Part II
|
Describe those TWO (2) matrices correctly. The similarity is less than 2%.
|
Describe those TWO (2) matrices adequately. The similarity is between 2% to 4%.
|
Describe those TWO (2) matrices incorrectly. The similarity is greater than or equal to 5%.
|
Q4
|
_____/4
|
Q5
|
_____/8
|
Their accuracy, precision and recall values are calculated correctly and the comparison between those two models’ performance are explained precisely. The similarity is less than 2%.
|
Their accuracy, precision and recall values are calculated correctly but the comparison between those two models’ performance is explained adequately. The similarity is between 2% to 4%.
|
Their accuracy, precision and recall values are calculated incorrectly and the comparison between those two models’ performance are explained wrongly. The similarity is greater than or equal to 5%.
|
Q6
|
_____/3
|
Correctly demonstrate sample of test data and show how the best model built can predict whether the test data can give 0 or 1 (spam or not spam). The similarity is less than 2%.
|
Adequately demonstrate sample of test data and show adequately how the best model built can predict whether the test data can give 0 or 1 (spam or not spam). The similarity is between 2% to 4%.
|
Incorrectly demonstrate sample of test data and can’t show how the best model built can predict whether the test data can give 0 or 1 (spam or not spam). The similarity is greater than or equal to 5%.
|
Submission Requirements
1. Font type : Times New Roman
2. Font size : 12
3. Line spacing : 1.5
4. Alignment : Justify Text
5. Document type : .pdf, .ipynb
6. Number of pages : 5 – 12 pages (do not exceed the page limit)
7. Your full report should consist of the following:
a) Cover page (Name, ID, Date, Signature, Score)
b) Marking Rubrics & Declaration (attach as second page in the report)
c) Report of your answer script.
d) Appendixes (line spacing = 1.0)
· List of references (APA format)
· Python script.
· Report of similarity score (percentage of similarity score from each source needs to be shown)
8. Start each question on a separate page.
9. All figures and tables are labelled properly.
10. File naming conventions: StudentName_Assignment1