代做SEMTM0016 Artificial Intelligence for Robotics Part B调试Python程序-留学生作业帮

代做SEMTM0016 Artificial Intelligence for Robotics Part B调试Python程序

SEMTM0016 Artificial Intelligence for Robotics

SEMTM0016 Coursework - Part B

Task Overview

You are the mighty HeroBot traversing the MazeDungeon where you will encounter many different entities.

You can find the MazeDungeon environment repository via this link: https://github. com/ uobcll/SEMTM0016_DungeonMazeWorld

• You can follow the README of the repository for the basic components of MazeDungeon environment.

• manual control . py : The code in this file shows how to load the dungeon maze env, how to reset the environment and how to check the state, action and reward from environment.

You have three tasks to complete:

Q1: Load the environment and implement rollout function and some simple policies.

Q2: Implement the model-based methods: Policy iteration and value iteration.

Q3: Implement the model-free methods: Monte-Carlo and Temporal Difference.

Question 1 - Simulation in environment and policies (4 Marks):

(1.1) Implement a rollout function by which you can sample a complete trajectory in the envi- ronment for a given policy. The function takes in the environment and policy, and returns the full rollout trajectory (e.g., a list of transition tuples where each tuple is (state, action, reward). The maximum environment step is set as 100 and grid size as 6.

(1.2) Implement the random policy (e.g. a uniform policy distribution over the action space for every state) to obtain the action the agent will take, and take trajectory using the rollout function in part (1.1) and the random policy.

(1.3) Implement the All-forward policy: only take move forwards action for every state, , and take trajectory using the rollout function in part (1.1) and the all-forward policy.

(1.4) Implement the Customized policy: any policy of your choice in this environment, and take trajectory using the rollout function and your customized policy.

Question 2 - Model-based Method (6 marks)

Let us set the grid size as 8, discount factor=1 and initial policy as random policy. Implement policy iteration and value iteration algorithm to get the optimal policy for the given environment. Specifically, you need to complete the following sub-tasks:

(2.1) Initialize value table. Implement policy iterations policy evaluation process conducting mul- tiple bellman equation updates utill the value table converges.

(2.2) Implement policy improvement process in policy iteration.

(2.3) Iteratively apply policy evaluation in (2.1) and policy improvement in (2.2) until the final optimal policy and value table converges.

(2.4) Initialize value table. Implement value iteration process till the optimal value table converges. (2.5) Get the optimal policy from the optimal value table derived in (2.4).

(2.6) Analyse how these two algorithms perform differently (e.g, convergence speed, numerical stability, senstivity to initial value table, and how the poilcy changes).

Question 3 - Model-free Method (7 marks)

Let us set the grid size as 10, discount factor=0.99 and initial policy as random policy. The maxi- mum environment step is set as 100. Suppose we do not have access to an explicit environment model and we can only sample trajectories from it. Implement Monte-Carlo and Temporal-Difference learning algorithms. Specifically, you need to complete the following sub-tasks.

(3.1) Given rollout function output, implement a cumulative reward calculation function to calcu- late the cumulative reward for one sampled trajectory.

(3.2) Implement Monte-Carlo sampling algorithms: sample some trajectories using the initial ran- dom policy and use the trajectories to get the Monte-Carlo value estimate for each grid. (Initialise each grid’s value as 0).

(3.3) Implement Temporal-Difference Learning algorithms: initialise the value table, sample some trajectories using the initial random policy. Then use the trajectories to conduct temporal difference learning until the value table converges. (Note that terminal states value are 0).

(3.4) Justify your design choices when setting the hyperparameters for these two algorithms (e.g., number of trajectory samples, td learning rate, value table initialisation, etc) and analyse how these two algorithms perform differently (e.g., convergence speed, numerical stability and hyperparameter sensitivity).

Report

There are 3 marks for overall presentation of the report.

Your report should be no longer than six pages, shorter is fine. Use an 11 or 12pt font and do not try tricks like expanding the margin to fit in more text, shorter is better than longer.

Your report must be submitted as a pdf and should be prepared either in LaTeX (overleaf is a good approach), MS Word, or a similar text editor to prepare the report and submit it as a pdf document.

Your code will not be marked for elegance, but it should run correctly; it is expected you will use Python. Do not include screenshots of graphs, they should be imported directly; resize them to the correct size before importing them, if the labels are tiny the graphs will not be marked. Make sure figure captions are descriptive, it is better to have some overlap between figure captions and the main text than to have figure captions that are not reasonably self-contained.

Avoid code snippets in the report unless that feels like the best way to illustrate some subtle aspect of an algorithm; do always though consider a mathematical description if possible. You will be asked to submit your code and it will be tested to make sure it works and matches your report. It will not, however, be marked itself for quality.

The teaching assistants (TAs) are unable to answer questions about how to solve an exercise or what methods to use beyond what has been specified in the coursework document. However, if you need help to know more about a method in a certain lab/worksheet in order to solve an exercise, do ask TAs for help about that method.

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030

联系我们

EMail: 99515681@qq.com

QQ: 99515681

留学生作业帮-留学生的知心伴侣！

工作时间：08:00-21:00

微信客服：codinghelp

热门主题

课程名