COMP9444 Neural Networks and Deep Learning

 COMP9444 Neural Networks and Deep Learning

Term 2, 2020
Project 2 - Rating Prediction
Due: Sunday 9 August, 23:59 pm
Marks: 30% of final assessment
 
Note: hw2main.py has been updated at 9:40am on Friday 24 July. Specifically, line 104 was changed to:
   outputs = student.convertNetOutput(net(inputs, length)).flatten()
Please either download the new version, or edit your own hw2main.py by adding the command .flatten()
Introduction
For this assignment you will be writing a Pytorch program that learns to read product reviews in text format and predict an integer rating from 1 to 5 stars associated with each review.
Getting Started
Copy the archive hw2.zip into your own filespace and unzip it. This should create an hw2 directory containing the main file hw2main.py, skeleton file student.py and data file train.json . Your task is to complete the file student.py in such a way that it can be run in conjunction with hw2main.py by typing
   python3 hw2main.py
You must NOT modify hw2main.py in any way. You should ONLY modify student.py
The provided file hw2main.py handles the following:
 
Loading the data from train.json
Splitting the data into training and validation sets (in the ratio specified by trainValSplit)
Data Processing: strings are converted to lower case, and lengths of the reviews are calculated and added to the dataset (this allows for dynamic padding). You can optionally add your own preprocessing, postprocessing and stop_words (Note that none of this is necessarily required, but it is possible).
Vectorization, using torchtext GloVe vectors 6B.
Batching, using the BucketIterator() prodived by torchtext so as to batch together reviews of similar length. This is not necessary for accuracy but will speed up training since the total sequence length can be reduced for some batches.
The code is structured to be backend-agnostic. That is, if a GPU is present, it will automatically be used; otherwise, the CPU will be used. This is the purpose of the .to(device) function being called on several operations.
Please take some time to read through hw2main.py and understand what it does.
Constraints
We have tried to structure hw2main.py so as to allow as much flexibility as possible in the design of your student.py code. You are free to create additional variables, functions, classes, etc., so long as your code runs correctly with hw2main.py unmodified, and you are only using the approved packages. You must adhere to the these constraints:
Your model must be defined in a class named network.
The savedModel.pth file you submit must be generated by the student.py file you submit.
Your submission (including savedModel.pth) must be under 50MB and you cannot load any external assets in the network class.
While you may train on a GPU, you must ensure your model is able to be evaluated on a CPU.
The GloVe vectors are stored in a subdirectory called .vector_cache. You are restricted to using GloVe vectors 6B, but you are free to specify the value of dim (50, 100, 200 or 300).
You must ensure that we can load your code and test it. This will involve importing your student.py file, creating an instance of your network class, restoring the parameters from your savedModel.pth, loading our own test dataset, processing according to what you specified in your student.py file, and calculating accuracy and score.
 
You may NOT download or load data other than what we have provided. If we find your submitted model has been trained on external data you will receive zero marks for the assignment.
 
Question
At the top of your code, in a block of comments, you must provide a brief answer (one or two paragraphs) to this Question:
Briefly describe how your program works, and explain any design and training decisions you made along the way.
Marking Scheme
After submissions have closed, your code wil be run on a holdout test set (i.e. a set of reviews and ratings that we do not make available to you, but which we will use to test your model). Marks will be allocated as follows:
12 marks for Algorithms, Style, Comments and Answer to the Question
18 marks based on performance on the (unseen) test set
The performance mark will be a function of the Weighted score, which is:
(1.0 × Correct predictions percentage) + (0.4 × One star away percentage)
Groups
This assignment may be done individually, or in groups of two students. Groups are determined by an SMS field called hw2group. Every student has initially been assigned a unique hw2group which is "h" followed by their studentID number, e.g. h1234567. If you plan to complete the assignment individually, you don't need to do anything (but, if you do create a group with only you as a member, that's ok too). If you wish to form a group, go to the COMP9444 WebCMS page and click on "Groups" in the left hand column, then click "Create". Leave the "Group Type" as "Default". After creating a Group, click "Edit", search for the other member, and click "Add". WebCMS assigns a unique group ID to each group, in the form of "g" followed by six digits (e.g. g012345). We will periodically run a script to load these values into SMS. You must ensure there are no more than two members in your group, and no-one is a member of two different groups.
 
Submission
You should submit your trained model and Python code by typing
give cs9444 hw2 student.py savedModel.pth
You must submit your trained model savedModel.pth as well as the Python code student.py
You can submit as many times as you like - later submissions by either group member will overwrite previous submissions by either group member. You can check that your submission has been received by using the following command:
 
9444 classrun -check
The submission deadline is Sunday 9 August, 23:59. 15% penalty will be applied to the (maximum) mark for every 24 hours late after the deadline.
 
Additional information may be found in the FAQ and will be considered as part of the specification for the project. You should check this page regularly.
 
When you submit, the system will check that your model can be successfully loaded, and evaluate it on data randomly chosen from a third dataset (disjoint from train.json and also disjoint from the holdout test set).
Common Questions:
Can I train on the full dataset if I find it? No. You should NOT attempt to reconstruct the test set by searching the Internet. We will retrain a random selection of submissions, as well as those achieving high accuracy. If your code attempts to search or load external assets, or we find a mismatch between your submittied code and saved model, you will receive zero marks.
 
My model is only slightly larger than 50MB, can you still accept it? No, the 50MB limit is part of the assignment specification and is quite generous. You should be able to get away with much less.
 
Can we assume you will call net.eval() on our model prior to testing? Yes.
 
Can we assume a max length on the reviews? No. But nothing will be significantly longer than what is present in the training set.
 
General Advice:
You have been provided only rudimentary skeleton code that saves your model and prints the loss and accuracy at various inputs. You will almost certainly need to expand on this code so as to have a clearer understanding of what your model is doing.
 
If you find your training accuracy is high, but the submission accuracy is low, you are overfitting to the training data.
 
Try to be methodical in your development. Blindly modifying code, looking at the output, then modifying again can cause you go around in circles. A better approach is to keep a record of what you have tried, and what outcome you observed. Decide on a hypothesis you want to test, run an experiment and record the result. Then move on to the next idea.
You should consider the submission test script to be the final arbiter with regard to whether a certain approach is valid. If you try something, and the submission test runs and you get a good accuracy then the approach is valid. If it causes errors then it is not valid.
 
Do Not leave this assignment to the last minute. Get started early, and submit early in order to ensure your code runs correctly. Marks from automated testing are final. You should aim to be uploading your final submission at least two hours before the deadline. It is likely that close to the deadline, the wait time on submission test results will increase.
Plagiarism Policy
Your program must be entirely your own work. Plagiarism detection software will be used to compare all submissions pairwise and serious penalties will be applied, particularly in the case of repeat offences.
DO NOT COPY FROM OTHERS; DO NOT ALLOW ANYONE TO SEE YOUR CODE
 
Please refer to the UNSW Policy on Academic Integrity and Plagiarism if you require further clarification on this matter.
 
Good luck!
 

热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图