代写ISIT312 Big Data Management Assignment 3 Spring 2025代做留学生SQL 程序

ISIT312 Big Data Management

Assignment 3

Spring 2025

Scope

This assignment includes the tasks related to querying a data cube, design and implementation of HBase table, querying and manipulating data in HBase table, data processing with Pig, and data processing with Spark.

This assignment is due on Saturday, 01 November 2025, 7:00pm (sharp).

This assignment is worth 20% of the total evaluation in the subject.

The assignment consists of 4 tasks and specification of each task starts from a new page.

Only electronic submission through Moodle at:

https://moodle.uowplatform.edu.au/login/index.php

will be accepted. A submission procedure is explained at the end of Assignment 1 specification.

A policy regarding late submissions is included in the subject outline.

Only  one  submission  of  Assignment  3  is  allowed  and  only  one  submission  per  student  is accepted.

A submission marked by Moodle as "late" is always treated as a late submission no matter how many seconds it is late.

A submission that contains an incorrect file attached is treated as a correct submission with all consequences coming from the evaluation of the file attached.

All files left on Moodle in a state "Draft(not submitted)" will not be evaluated.

A submission of compressed files (zipped, gzipped, rared, tared, 7-zipped, lhzed, … etc) is not allowed. The compressed files will not be evaluated.

An implementation that does not compile well due to one or more syntactical and/or run time errors scores no marks.

Using any sort of Generative Artificial Intelligence (GenAI) for this assignment is NOT allowed !

It is expected that all tasks included within Assignment 3 will be solved individually without any cooperation with the other students.  If you have any doubts, questions, etc. please consult your lecturer or tutor during lab classes or office hours . Plagiarism will result in a FAIL grade being recorded for the assessment task.

Task 1 (5 marks)

Querying a data cube

Use Hive to create an internal table  TRIP with records of individual trips, with each record containing the driver's license, the truck's registration, the trip length in kilometers, and the date of the trip.

create table TRIP (

registration   char(7),

license        char(7),

kilometers     decimal(2),

tday           decimal(2),

tmonth         decimal(2),

tyear          decimal(4) )

row format delimited fields terminated by ','

stored as textfile;

Remove a header line from a file task1.csv and save the file.

Populate the table by loading data from the file task1.csv.

(1) 0.5 mark

Implement the following query using GROUP BY clause with CUBE operator.

Find the total number of trips per driver (license), per truck (registration), per driver and truck (license, registration), and the total number of trips.

(2) 0.5 mark

Implement the following query using GROUP BY clause with ROLLUP operator.

Find the longest trip (kilometers) per driver (license) and per driver and truck (license, registration) and the longest trip at all.

(3) 0.5 mark

Implement  the  following  query  using   GROUP BY clause  with   GROUPING SETS operator.

Find the shortest trip (kilometers) per driver (license)   and   per   truck (registration) and per driver and year (license, tyear) .

Implement the following SQL queries as SELECT statements using window partitioning technique.

(4)  0.5 mark

For each truck, list its registration number, the length of its longest and shortest trips (in kilometers), the total number of trips, and the average trip length (in kilometers) .

(5) 0.5 mark

For each truck list its registration (registration) and all its trips (license, tday, tmonth, tyear,  kilometers)   sorted in descending    order   of   trip    length (kilometers) and a rank (position number in an ascending order) of each trip. Use an analytic function ROW_NUMBER().

(6) 0.5 mark

For  each  driver,  list  its  license  number  (license),  total  length  of  all his/her  trips (kilometers), and the average length of all trips (kilometers) .

(7) 0.5 mark

For each driver (license) and truck (registration) and for each trip length (kilometers) list the longest trip length (kilometers) aggregated per driver (license) .

(8) 0.5 mark

For each truck (registration) find how the total trip length (kilometers) changed year by year (tyear). Order the results in the ascending order of years (tyear) .

(9) 0.5 mark

For each truck (registration) list an average length of the current and previous trip (kilometers). Order the results in the ascending order of trip length (kilometers).

(10) 0.5 mark

For each truck (registration) list an average length of the current, the previous and the next trip (kilometers) . Order the results in the ascending order of trip length (kilometers) .

When ready, save your SELECT statements in a file solution1.hql. Then, process a script. file solution1.hql and save the results in a report solution1.txt.

Deliverables

A file solution1.txt that contains a report from processing of SELECT statements implementing the queries listed above.

Task 2 (5 marks)

Design and implementation of HBase table (3 marks)

(1)  Consider the following conceptual schema of a sample data cube designed to analyze vehicle repairs by mechanics for vehicle owners.

Design a single HBase table to store the data described by the conceptual schema above.

Create HBase script.  solution2-1.hb with HBase  shell  commands that create HBase table and load sample data into the table. Load into the table information about at least two vehicles, two owners, two mechanics and three repairs.

When ready use HBase shell to process a script file solution2-1.hb and to save a report from processing in a file solution2-1.txt.

Querying HBase table (2 marks)

(2)  Consider  a  conceptual  schema  given  below .  The  schema  represents  a  data  cube where students submit assignments and each submission consists of several files and it is related to one subject.

Download  a  file   task2-2.hb with  HBase   shell  commands.  Process  a   script task2-2.hb. Processing of the script creates HBase table  task2-2 and loads some data into it.

Use  HBase  shell  to  implement  the  queries  and  data  manipulations  listed  below . Implementation of each step is worth 0.4 of a mark.

Save the queries and data manipulations in a file  solution2-2.hb. Note that implementation of the queries and data manipulations listed below may need more than one command of HBase shell.

(1) Find all information included in a column family SUBJECT qualified by code and column family FILES qualified by fnumber1 and fnumber2.

(2) Find all information about a subject that has a code 312, list two versions per cell.

(3) Find all information about a submission of assignment 1performed by a student 007 in a subject 312, list one version per cell.

(4) Replace  a submission date of assignment 1 performed by a student 007 in a subject 312 with a date 02-APR-2019 and   then list a column  family SUBMISSION to verify the results.

(5)  Add a column family DEGREE that contains information about titles of degrees enrolled by the students. Assume that a student can enrol only one degree . Then add information about a title of degree enrolled by a student with a number 007. A degree title is up to you . List all information about a student with a number 007.

When ready, start HBase shell and process a script file  solution2-2.hb with the Hbase shell commands. Save report from processing of the script. in a file solution2- 2.txt.

Deliverables

A file solution2-1.txt with a listing from processing of a script file solution2- 1.hb.

A file solution2-2.txt with a listing from processing of a script file solution2- 2.hb.

Task 3 (5 marks)

Data processing with Pig Latin

Consider the following logical schema of two-dimensional data cube.

Download  a  file  task3.zip published  on  Moodle  together  with  a  specification  of Assignment 3 and unzip it. You should obtain a folder task3 with the following files: driver.csv, truck.csv and trip.csv.

Use a text editor to examine the contents ofthe files.

Upload the files into HDFS.

Open  Terminal window  and  start  pig command  line  interface  to  Pig.  Use  pig command line interface to implement the following actions. Implementation of each step is worth 1 mark.

(1)  Use load command to load the files truck.csv, driver.csv and trip.csv from HDFS into a Pig storage.

Use Pig  Latin  and  Pig  Grunt  command  line  interface  to  implement  and  process  the following queries.

(2)  Find the full names (first-name, last name) of drivers who used the trucks manufactured (make) either by DAF or MAN.

(3)  Find the full names (first-name, last name) of drivers who used the trucks manufactured (make) by DAF and on the other occasion manufactured by MAN.

(4)  Find the full names (first-name, last name) of drivers who never travelled to Albany.

(5)  Find the total number of times each truck (registration) was used on a trip to Albany.  There is no need to list the trucks never used on any trip to Albany.

Once completed,  copy the  entire contents of the Terminal window,  including data loading outputs, processed queries, ALL messages, and ALL results, to the clipboard. Then, paste these contents into a text file named solution3.txt.

Deliverables

A file solution3.txt that contains a listing of data loadings and queries performed above , ALL messages and the results of operations. A file solution3.txt must be created through  Copy/Paste  of the  entire  contents  of  Terminal window  into  a  file solution3.txt.  No  screen  dumps  are  allowed  and  no  screen  dumps  will  be evaluated. Solutions that do not include the query processing messages will not receive any marks.

Task 4 (5 marks)

Data processing with Spark

Consider the following logical schema of two-dimensional data cube.

Download  a  file  task4.zip published  on  Moodle  together  with  a  specification  of Assignment 3 and unzip it. You should obtain a folder task4 with the following files: driver.csv, truck.csv and trip.csv.

Use a text editor to examine the contents ofthe files.

Upload the files into HDFS.

Open Terminal window and start pyspark command line interface to Spark. Use pyspark command line interface to implement the following actions. Implementation of each step is worth 1 mark.

(1)  Create the schemas for the files   truck.csv, driver.csv, and trip.csv.

(2)  Create the data frames with the contents of the files truck.csv,  driver.csv, and trip.csv using the schemas created in the previous step.

Count the total number of rows in each frame and then list the contents of each frame.

(3)  Create and process the  following query directly on the trips DataFrame, without creating a temporary view.

Find the total number of times each driver (license, first name, last name) travelled to Albany. There is no need to list the drivers who never travelled to Albany.

(4)  Create a temporary view over a data frame with information about the trips and drivers.

(5)  Execute the following query on a temporary view containing information about the trips and drivers.

Find the total number of times each driver (license, first name, last name) travelled to Albany. There is no need to list the drivers who never travelled to Albany.

When  ready,  copy   into   a  clipboard  the  contents   of  Terminal window  with  the operations processed above and the results listed in the window and paste the results from a clipboard into a text file solution4.txt.

Deliverables

A file solution4.txt that contains a listing of operations performed above and the results of operations. A file solution4.txt must be created through Copy/Paste of the contents of Terminal window into a file solution4.txt. No screen dumps are allowed and no screen dumps will be evaluated.

Submission of Assignment 3

Note, that you have only one submission. So, make it absolutely sure that you submit the correct files with the correct contents. No other submission is possible !

Submit  the  files solution1.txt, solution2-1.txt, solution2-2.txt, solution3.txt, and solution4.txt through Moodle in the following way:

(1)  Access Moodle at http://moodle.uowplatform.edu.au/

(2)  To login use a Login link located in the right upper corner the Web page or in the middle of the bottom of the Web page

(3)  When    logged     select    a     site ISIT912/312 (S225) Big Data Management

(4)  Scroll down to a section Assessment items (Assignments)

(5)  Click at In this place you can submit the outcomes of your work on the tasks included in Assignment 3 for ISIT312 students link.

(6)  Click at a button Add Submission

(7)  Move a file solution1.txt into an area File submissions. You can also use a link Add…

(8)  Repeat  a  step  (7)  for  the  files solution2-1.txt, solution2-2.txt, solution3.txt, and solution4.txt.

(9)  Click at a button Submit assignment.

(10) Click at the checkbox with a text attached: By checking this box, I confirm that this submission is my own work, I accept responsibility for any copyright infringement that may occur as a result of this submission, and I acknowledge that  this  submission  may be  forwarded  to  a  text- matching service.

(11) Click at a button Continue

(12) Check if Submission status is Submitted for grading.




热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图