代做COMP9414 23T2 Artificial Intelligence

OMP9414 23T2
Artificial Intelligence
Assignment 1 - Reward-based learning agents
Due: Week 5, Friday, 30 June 2023, 11:55 PM
1 Activities
In this assignment, you are asked to implement a modified version of the
temporal-difference method Q-learning and SARSA. Additionally, you are
asked to implement a modified version of the action selection methods soft-
max and ?-greedy.
To run your experiments and test your code you should make use of the
example gridworld used for Tutorial 3 (see Fig. 1). The modification of the
method includes the following two aspects:
Random numbers will be obtained sequentially from a file.
The initial Q-values will be obtained from a file as well.
The random numbers are available in the file random numbers.txt.
The file contains 100k random numbers between 0 and 1 with seed = 9999
created with numpy.random.random as follows:
import numpy as np
np.random.seed(9999)
random_numbers=np.random.random(100000)
np.savetxt("random_numbers.txt", random_numbers)
1
04
8 9 10 11
5 6 7
1 2 3
Figure 1: 3× 4 gridworld with one goal state and one fear state..
1.1 Implementing modified SARSA and ?-greedy
For the modified SARSA you must use the code review during Tutorial 3 as
a base. Consider the following:
The method will use a given set of initial Q-values, i.e., instead of
initialising them using random values the initial Q-values should be
obtained from the file initial Q values.txt. You must load the
values using np.loadtxt(initial Q values.txt).
The initial state for the agent before the training will be always 0.
For the modified ?-greedy, create an action selection method that receives
the state as an argument and returns the action. Consider the following:
The method must use sequentially one random number from the pro-
vided file each time, i.e., a random number is used only once.
In case of a random number rnd <= ? the method returns an ex-
ploratory action. We will use the next random number to decide what
action to return, as shown in Table 1.
You should keep a counter for the random numbers, as you will need it
to access the numbers sequentially, i.e., you should increase the counter
every time after using a random number.
2
Random number (r) Action Action code
r <= 0.25 down 0
0.25 < r <= 0.5 up 1
0.5 < r <= 0.75 right 2
0.75 < r <= 1 left 3
Table 1: Exploratory action selection given the random number.
1.2 Implementing Q-learning and softmax
You should implement the temporal-difference method Q-learning. Consider
the following for the implementation:
For Q-learning the same set of initial Q-values will be used (provided
in the file initial Q values.txt).
Update the Q-values according to the method. Remember this is an
off-policy method.
As in the previous case, the initial state before training is also 0.
For the softmax action selection method, consider the following:
Use a temperature parameter τ = 0.1.
Use a random number from the provided file to compare it with the cu-
mulative probabilities to select an action. Hint: np.searchsorted
returns the position where a number should be inserted in a sorted array
to keep it sorted, this is equivalent to the action selected by softmax.
Remember to use and increase a counter every time you use a random
number.
1.3 Testing and plotting the results
You should plot a heatmap with the final Q-values after 1,000 learning
episodes. Additionally, you should plot the accumulated reward per episode
and the number of steps taken by the agent in each episode.
For instance, if you want to test your code, you can use the gridworld
shown in Fig. 1 and you will obtain the rewards shown in Fig. 2 and the
(d) SARSA + softmax.
Figure 3: Steps per episode.
2 Submitting your assignment
You can do the assignment either individually or working in a couple with
a classmate. If you decide to work in a couple with a classmate in another
tutorial section, you need the approval of one of these two tutors, who will
conduct the discussion with you and your classmate. However, the other
tutor still needs to be informed by the student.
Your submission should be done by Moodle and consist of only one .py
file. If you work in a couple, only one person is required to submit the file.
However, the file should indicate on top as a comment the full name and
zID of the students. It is your responsibility to indicate the names, we will
not add people to a work after the deadline if you forget to include the names.
5
You can submit as many times as you like before the deadline – later
submissions overwrite earlier ones. After submitting your file a good practice
is to take a screenshot of it for future reference.
Late submission penalty: UNSW has a standard late submission
penalty of 5% per day from your mark, capped at five days from the as-
sessment deadline, after that students cannot submit the assignment.
3 Deadline and questions
Deadline: Week 5, Friday 30th of June 2023, 11:55pm. Please use the forum
on Moodle to ask questions related to the project. However, you should not
share your code to avoid making it public and possible plagiarism.
Although we try to answer questions as quickly as possible, we might take
up to 1 or 2 business days to reply, therefore, last-moment questions might
not be answered timely.
4 Plagiarism policy
Your program must be entirely your own work. Plagiarism detection software
might be used to compare submissions pairwise (including submissions for
any similar projects from previous years) and serious penalties will be applied,
particularly in the case of repeat offences.
Do not copy from others. Do not allow anyone to see your code.
Please refer to the UNSW Policy on Academic Honesty and Plagiarism if you
require further clarification on this matter.

热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图