代做FIT5226 - Individual Assignment调试Python程序

FIT5226 - Individual Assignment

Multi-agent learning of a coordination problem

This document describes the FIT 5226 project which is your individual assignment. It is worth 50% and is due on May 5, 2025. Note that there is an option to submit part of the assignment early that makes variations of the task accessible (see below, read the instructions carefully and completely now).

“Give me six hours to chop down a tree and I will spend the first four sharpening the axe.

— Commonly misattributed to Abraham Lincoln

Tasks

You will write code for multiple agents in a square grid world to learn a simple transport task that requires coordination between the agents. You are free to use Deep-Q or Tabular-Q as you see fit. You are not allowed to use any learning methods (eg. heuristics) beyond Q.

Your agents still have the same four actions that they can execute: “move north/south/west/east” . A “wait” action in which the agent remains on the same grid square is not allowed. As before, there are two distinct cells A and B. The cells A and B can be in any location and agents are allowed to observe both locations. Each agent starts at one of the two locations A or B (during training, you are free to choose or randomise which one). As before, items need to be transported from A to B. Pickup at A and dropoff at B happen automatically when reaching the respective location without the agent having to take specific action. A and B are the same for all agents (but allocated randomly for each new training and test run).

Instead of performing a single delivery, agents now need to learn to shuttle indefinitely between A and B, continuing to deliver new items. (The supply at A and the demand at B are unlimited). In other words, the task is not episodic anymore, the agents are learning an infinite behaviour.

Most importantly, the agents now have to learn to coordinate their paths in order to avoid collisions!  More specifically, they have to avoid head-on collisions between different agents. To simplify matters, we define a head-on collision as one that occurs between an agent moving from A to B and another agent moving from B to A (see red cell in the diagram). It is permissible for multiple agents to step on the same cell if and only if all move from A to B (or all move from B to A). Collisions in locations A and B are disregarded.

To keep the learning time within limits we work on a small 5x5 grid as before. We use 4 agents (you must not vary these numbers as otherwise the collision counts in Table 2 will change).

All agents are allowed to share the same DQN or Q-Table or use their individual q- table/network as you see fit.

The agents may observe their own location, the locations of A and B, and, of course, whether they are carrying an item. Note that you can “buy” additional sensors to observe more information if you find this helpful (see below).

All agents normally have to perform. their actions sequentially in a random order.

Simplifying the task by “purchasing” options

You have the option to balance aiming for higher performing agents with the complexity of the problem structure. The performance that your agents reach influences the marks you will receive for your solution and you can “purchase” options to simplify the task structure so that it becomes easier to train high performing agents. The catch, however, is that you will have to “pay” for each option that you select with points. You can compensate for the “purchase price” by training a better performing agent that is rewarded with “performance points” or you can solve the unmodified task and settle for a lesser performance. The table below details the options available and their impact on marks received.

Your raw mark for the implementation categories “Solution and Functionality” and “Implementation and Coding” of the rubric will be scaled by a factor of

where C is the sum of costs for your chosen options and B the sum of points for the substantiated performance. In other words, for each 2 points of purchase cost that are not compensated for by outstanding learning performance you will lose ⅓ of the marks in these categories.

Table 1: Options available

Option Type

Item

Description

Cost

Sensors

State of

neighbouring cells

You may augment the

observation space for all agents with the

occupancy state of all

cells or chosen cells in the agent’s immediate 8-

neighbourhood

(unoccupied, occupied)

2

State of

neighbouring cells checked for agents of opposite type

As per previous entry,

only cells that contain an agent going in the

opposite direction (as defined above) will be marked as occupied.

3

Coordination

Central clock

This allows you to

coordinate the update

schedule of your agents. Instead of having to

update all agents in

random order (as

described above) you can update them in round-

robin fashion or any other order that you determine.

1

Training conditions

Off-the-job training

Instead of having to learn with each episode starting from random locations for all agents and A, B you

can define a schedule for the configuration at the

start of training episodes.

2

Staged Training

Instead of letting the

agents learn everything in a single type of training

run you may break the

training into different

phases (eg. with different grid sizes, different

numbers of agents,

different penalty

schemata, etc.) Q-tables or q-networks may be

passed between the stages.

3

Setup

Fixed delivery location B

Instead of having to

service an (observable)

random delivery location, the target location B is

always in the bottom right corner of the grid.

However, if you choose

this option, the agents can no longer observe the

location of B. Instead,

they have to discover it.

4

Table 2: Performance Points

You can receive up to 2 performance points for two categories, so that a maximum total of 4 is possible. To receive these you must provide code that measures the claimed performance!

Performance level after training

(percentage of scenarios

solved in less than 20 steps and without any collisions)

Total number of collisions occurring during training to the claimed performance level

Performance Points

>95%

<500

2

>85%

<1,000

1

Required minimum performance

Any state-of-the art LLM will willingly solve the above task for you out-of-the box. However, it will solve it in a pretty naive, mediocre way. We act in the spirit of some major retail chains: If you see the same item somewhere else we’ll beat it by 20%” . So your job is to beat the performance of these naive LLM products. Thus, your training must be completed within the following limits to achieve full marks:

Step budget: 1,500,000 (Maximum number of agent steps allowed during training)

Collision budget: 4,000 (Maximum number of head-on collisions allowed during training)

Walltime budget: 10 minutes (Maximum runtime allowed for training)

Final performance: after training, your agents must be able to perform. a single delivery (starting at B) in at least 75% of all scenarios within at most 25 steps collision-free.

Task Details

You are free to use any work that was produced by you in the tutorials. You are even free to use the example solutions!

1.   Contract: Determine your “shopping list”, ie. the list of options you want to use.

During the first week of the assignment, the options will be on sale. Each will only “cost” half the points listed in Table 1. You must submit your list of chosen options on Moodle by April 17, 2025 if you want to obtain them at the early bird cost. You can still pick other additional options after this date, but they will be “charged” at the full price. If you submit a list of early bird option purchases, this is deemed final and binding, ie. you will be “charged” for these options even if you later decide not to use them. Late submissions of the early bird purchase list will not be accepted unless formal special consideration for Assignment 1 applies and an unsubmitted list will be taken to mean “no options selected.” You are not allowed to confer with other students in the class about your choices before the submission. This constitutes collusion. Please fill in the contract form available in the Moodle Assignment section and submit it through the Moodle portal.

2.   Implementation: Implement the Q-Learning scenario in Python as a Jupyter

notebook using any chosen options. If you want to use any other additional libraries apart from the provided Skeleton, Numpy and Matplotlib check with the unit coordinator beforehand whether these are admissible. Train and test your Q-Learning environment.To obtain performance points you must provide code so that   the claimed performance can directly be tested. Observe the minimum requirements listed above.

3. Documentation: Explain your training approach in full detail. Devise a test

procedure and use appropriate metrics and visualisations to show convincingly that your agents learn the task successfully and that they learn to coordinate effectively. Document this in full detail in writing.

Please be aware that you will have to participate in an interview where you have to demonstrate and explain your submission. Even if you use AI to solve the assignment (see below) you have to be able to fully explain every aspect of your submission to receive full marks, including the Python code.




热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图