代写INFO20003 Semester 2, 2025 Assignment 2: SQL代做数据库编程

INFO20003 Semester 2, 2025

Assignment 2 : SQL

Due: Week 8 - Friday 19th September 2025, 6 :00PM Melbourne Time.

Case: The Best Movie Recommendations Ever! (BMRE!)

Inspired by the Netflix Challenge x Community Ratings x Streaming Platforms

Introduction

In the 2000s, Netflix had a contest (‘The Netflix Prize’) to find the best ratings prediction algorithm to improve  recommendations for its users. This was one of the early releases of a large real-world dataset for data scientists and machine learning/AI researchers. Since then, there are many other movie-related datasets that have been released, for both researchers, as well as for everyday movie enthusiasts (like us)!

Inspired by the Netflix Prize, we present – “The Best Movie Recommendations Ever!” – a new challenge.

Assume you are part of a team of movie fans and AI experts looking to make the best recommendation algorithm. Because you are the database expert in the team, you are placed in charge of the database management and querying.

IMPORTANT: You do not have to do any AI/ML stuff for this assignment, just the SQL. For the purposes of anonymity of Internet users, and respecting the intellectual property of real-world movie platforms, note that the data given for this assignment are synthetic.

Description

The following description explains what data is made available to you. Be careful as many tables share identical attribute names, to simulate real-world datasets. Assume that there are no discrepancies between the different tables (e.g., Netflix year of release is the same as IMDB’s year of release – see below).

Netflix records, for each movie, a unique Netflix Movie ID (e.g., 80234304), movie title, year of release. Each movie is given ratings, where a rating record consists of the numeric rating (0-5 inclusive, integers), the timestamp of the rating, and a unique anonymous user ID. For privacy, no other user data is given.

Now, in this new challenge, we also obtain data from the comprehensive Internet Movie Database (IMDB), with a record for each movie. Each IMDB movie record has an IMDB Movie ID (e.g., tt12584954), the movie title, year of release, averageIMDB rating (decimal from 0.0 to 10.0 inclusive), number of people who have rated, and classification (e.g., PG-13, or R). Each movie record is associated with a lead director and up to five main actor(s)/actress(es).  Finally, each movie record has at least one genre (e.g., only Comedy, or Comedy + Drama, etc)., and one language (e.g., ‘EN’ for English, using the 2-letter ISO code for languages).

IMDB also uses some data from MetaCritic, which is a collection of critics scores for each movie. MetaCritic data records comprise of the IMDB Movie ID, Source, and Score (0-100 inclusive, integer); a movie can have any number of scores (e.g., one from a newspaper, the other from a film blogger).

Another reputable data source is RottenTomatoes, a movie review database. Each RottenTomatoes movie review record has a unique RottenTomatoes ID (e.g., ‘the_lord_of_the_rings_the_fellowship_of_the_ring’), a ‘Tomatometer’ critics score (0-100 inclusive, integer), and a ‘Popcornmeter’ audience score (0-100 inclusive, integer).

Finally, we also draw upon ‘tag’ data (like hashtags on social media) from the MovieLens recommendation service (inspired by Harper & Konstan, 2015). Anonymised users contribute tags to movies, in the form. of records containing the MovieLens User ID, MovieLens Movie ID, the tag, and a timestamp.

Luckily, your teammates in this challenge have also supplied mappings in the following tables: imdb_to_netflix, imdb_to_rottentomatoes, and imdb_to_movielens (all self-explanatory) to link movie records across all data sources/platforms. Note that a movie *may or may not* have corresponding IDs in all tables (e.g., a new movie in cinemas may only have an IMDB record and no other corresponding records).

The Data Model

The Data Model from MySQL Workbench is provided in Figure 1.

FIGURE 1. DATA MODEL FOR BMRE!

Assignment 2 Setup

Please pay special attention to the penalties listed [].

A dataset is provided which you can use when developing your solutions. To set up the dataset, download the file BMRE.sql from the Assignment link on Canvas and run it in Workbench. This script creates the database tables and populates them with data.

The sample dataset provided is a basic, synthetic, extract and not necessarily the ‘full data‘. You may find that you may need to add some more sample data in Workbench to fully test edge cases for queries.

Note that this dataset is provided for you to experiment with, but it is NOT the same dataset as what your queries will be tested against (the schema will stay the same, but the data itself may be different). This means when designing your queries, you must consider edge cases even if they are not represented in this particular data set.

The script is designed to run against your account on the Engineering IT server (info20003db.eng.unimelb.edu.au). If you want to install the schema on your own MySQL Server installation, uncomment the lines at the beginning of the script.

WARNING: Do NOT disable only_full_group_by mode when completing this assignment. This mode is the default and is turned on in all default installs of MySQL workbench, and we’ve added a line to the top of slarc.sql to turn it on every time you run the script in case you disable it! You can check whether it is turned on using the command `SELECT  @@sql_mode 、; The command should return a string containing ONLY_FULL_GROUP_BY or ANSI. When testing, our test server WILL have this mode turned on, and if your query fails due to this, you will lose marks.

The SQL Tasks

Please pay special attention to the penalties listed [].

In this section are listed 10 questions for you to answer. Write one (single) SQL statement per question. Each statement must end with a semicolon (;). Subqueries and nesting are allowed within a single SQL statement – however, you may be penalised for writing overly complicated SQL statements.

WARNING: DO NOT USE VIEWS (or ‘WITH’ statements/common table expressions) OR VARIABLES to answer questions. Penalties apply.

? The Questions

1.    List aII IMDB movies which contain no MetaCritic reviews. Your query shouId return resuIts of the form (IMDBMovieID, MovieTitle). (1 mark)

2.    Find the NetfIix movie with the most recent review. Assume there are no ties: onIy one is the most recent. Your query shouId return one row (NetflixMovieID, MovieTitle, TimeOfMostRecentRating). (1 mark)

3.    List aII movies rated by MetaCritic source (The Washington Post’that have at Ieast 5 NetfIix user ratings. Your query shouId return resuIts of the form. (IMDBMovieID, NetflixRatingCount). (1 mark)

4.    Find the genre whose movies has the highest average Tomatometer score. If there are ties, then you must return aII genres with the highest average. Avg score must be rounded to 1 decimaI. Your query shouId

return resuIts of form. (genre, TomatometerAvgScore), with one row per genre in case of a tie. (2 marks)

5.    List aII MetaCritic scores and sources for fiIms that feature an actor with more than 2 words in their fuII

name (e.g.,“James EarI Jones”, but not“Anya TayIor-Joy”). Do not dupIicate resuIts if muItipIe such actors acted in the same fiIm. Your query shouId return resuIts of the form. (Score, Source, IMDBTitle). (2 marks)

6.    Find which year has the highest number of (pg’rated movies that have at Ieast one MovieLens tag of

(action_thriIIer’. If there are ties, then you must return aII resuIts. Your query shouId return resuIts of the form. (Year, MovieCount), with one row per Year in case of a tie. (2 marks)

7.    Find the totaI number of movies that are in IMDB but not in NetfIix (defined as X). SimiIarIy, find the totaI number of movies that are in IMDB but not in MovieLens (defined as Y). Your query shouId return one

row: (X, Y). (2 marks)

8.    We'II refer to a Metacritic review source that has reviewed at Ieast one movie in each Ianguage that

currentIy exists in the IMDB movies Iist as a 'gIobaI-reviewsource'. For each gIobaI-reviewsource, evaIuate how their review count, average score, and score standard-deviation (hint: use the STDDEV operation)

varies based on the Ianguage of the movie the source reviewed. Average and StdDev must be rounded to

1 decimaI pIace. Your query shouId return (globalReviewSource, language, countReviewsForLanguage, avgScoreForLanguage, popStdDevScoreForLanguage) (3 marks)

As an exampIe, consider the foIIowing dataset:

imdb_movie

id

language

1

(EN’

2

(ES’

3

(EN’

metacritic_review

source

imdb_movie_id

score

NYT

1

3

NYT

2

5

Variety

1

2

CurrentIy, there are 2 unique Ianguages in the imdb_movie tabIe. Source (NYT’has reviewed at Ieast one movie in each of these Ianguages (1x (EN’movie and 1x (ES’movie). Thus, NYT is a (gIobaI-reviewsource’. (Variety’is NOT a gIobaI-reviewsource, since they have not reviewed any (ES’movies.

The returned resuIt from running our query on the given dataset shouId be:

Source

Language

CountReviews

AvgScore

StddevScore

NYT

EN

1

3

0

NYT

ES

1

5

0

9.    A (muIti-taIented actor’is one who has acted in movies with 5 or more genres. This incIudes acting in a singIe movie that has 5+ genres, or across muItipIe movies with 5+ distinct genres across aII of them. Find muIti-taIented actors, and how many times peopIe have tagged any of their movies.  Your query shouId return resuIts of the form. (ActorName, NumberOfUniqueGenres, TotalNumberOfTags). (3 marks)

10.  A (consistent quaIity’movie is one where the average ratings by NetfIix users (converted to a percentage from 0-100%) matches the average Metacritic rating across aII sources, within 15 percentage points. For exampIe, movie X has 3 ratings of {3,4,5} on NetfIix, which converts to an average of 80.0%. Movie X has 3 MetaCritic reviews with scores {70%, 75%, 80%}, giving an average of 75.0%. Since  I80.0 一 75.0I <= 15, movie X is (consistent quaIity’. If a movie doesn’t have a Metacritic review, it is not eIigibIe to be consistent.

Of the (consistent quaIity’movies reIeased in 2021 or 2022, find the 3 with the highest avg NetfIix review score. If muItipIe movies tie, they shouId aII be incIuded. For exampIe, if 4 movies aII tie for 1st pIace, the query shouId return 4 rows. If 1st and 2nd were distinct, but 2 movies tied for 3rd pIace, the query shouId aIso return 4 rows. The finaI Metacritic and NetfIix average scores must be rounded to 1 decimaI pIace.

Return as (NetflixMovieID, RoundedAvgMetacriticScore, RoundedAvgNetflixScoreAsPercent). (3 marks)




热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图