代写Fundamental AI and Data Analytic (EIE1005) Workshop 4: Developing Game AI with OpenAI Gym代做Python语

Fundamental AI and Data Analytic (EIE1005)

Workshop 4: Developing Game AI with OpenAI Gym

A. Purpose

This workshop provides students the opportunity to train and test a game AI model based on reinforcement learning using OpenAI’s Gym library [1,2].

B. Things to do

1. Follow the instructions in the worksheet to install the required software modules.

2. Run the programs provided to train and test the agent in the game FrozenLake, a game environment provided by OpenAI’s Gym library [1,2].

3. Analyze the programs to understand the important parameters of reinforcement learning.

4. Answer the questions in this worksheet and submit it to the Blackboard.

C. Equipment

PC with the following software:

· Windows 10 or above

· Anaconda Navigator (version 2.0.3 or above)

· Spyder version 4.2.5 or above

D. Introduction - FrozenLake

FrozenLake is part of the Toy Text environment of Gym [1,2]. It involves crossing a frozen lake from Start(S) to Goal(G) without falling into any Holes(H) by walking over the Frozen(F) lake. The agent may not always move in the intended direction due to the slippery nature of the frozen lake. For each move, the agent can make one of the following four actions:

0: LEFT

1: DOWN

2: RIGHT

3: UP

The state of the agent is represented by an integer of value from 0 to 15 calculated by the following equation:

row × total number of columns + col (where both the row and col start at 0).

For example, the Goal position in a 4×4 map can be calculated as follows: 3 × 4 + 3 = 15. The number of possible states is dependent on the size of the map. For example, a 4×4 map has 16 possible states. The number of all states of a 4×4 map is shown in the diagram below, where the corresponding built-in game environment is also shown.

The agent can be trained by reinforcement learning using the standard application program interface (API) provided by Gym. The built-in reward scheme is as follows:

Reach goal(G): +1

Reach hole(H): 0

Reach frozen(F): 0

Refer to the course notes [3] on the meaning of action, state, and reward in reinforcement learning.

E. Workshop

Part I: Preparation

1. Login to your computer.

2. Launch Anaconda Navigator by clicking search and typing Anaconda Navigator (Anaconda3). Then press Enter (see the figure below).

3. In the Anaconda Navigator window, launch Spyder by clicking the button Launch.

4. Spyder is an integrated development environment (IDE), particularly for Python programming [4]. It includes advanced features for Python program editing, testing, and debugging. The Spyder IDE mainly contains three windows: Editor, Debugger, and Console. The Editor window is for editing Python program codes. The Debugger window provides detailed information on the program execution, including the values of the variables. The Console window allows the user to interact with the IDE. This workshop does not require you to do programming. You will mainly input your commands in the Console window and see the results.

5. The OpenAI’s library Gym is used in this workshop. Gym includes a standard API focused on reinforcement learning. It contains a diverse collection of reference environments, each of which is presented in the form. of a computer game. To install Gym, type the following command in the Console window and press Enter.

pip install gymnasium

6. Please enter the following commands, pressing Enter after each line. Wait approximately 20 seconds after each command:

pip install gym[toy_text]

pip install gymnasium[toy-text]

7. Restart the kernel by clicking Console -> Restart kernel in the top menu (see the figure below). Note that if your program hangs due to whatever reason, you can also restart the kernel to make the program run again.

The system is now ready for program development.

Part II: Play FrozenLake without training

1. Copy and paste the following Python program codes to the Editor window. Avoid making any changes, such as adding spaces or tabs.

#####################################################

#     EIE1005 - Workshop

# Reinforcement Learning for Game AI

#

#####################################################

#  Baseline Program

#####################################################

import gymnasium as gym

# 1. Load Environment

env = gym.make('FrozenLake-v1', render_mode='human', is_slippery=False)

# 2. Parameters setting

rev_list = [] # rewards per episode calculate

# 3. Initialize the environment

init_state = env.reset()  # Reset environment

s = init_state[0]

rAll = 0

j = 0

while j < 99:

j+=1

# Randomly generate action and get the reward

state, reward, terminated, truncated, info = env.step(env.action_space.sample())

env.render()

print("Current state: ", state, end = "   ")

print("Reward = ", reward)

if state == 5 or state == 7 or state == 11 or state == 12:

break

input("Press Enter to continue:")

print()

rAll += reward

s = state

if terminated or truncated == True:

break

rev_list.append(rAll)

env.render()

In the above Python program, all statements that start with the # symbol are not program codes, but comments. The comments are usually used for explaining the program codes. Pay attention that all indentations in the program codes need to be followed exactly. The program cannot be executed if any indentation is modified.

2. Save the program with the filename FrozenLake_baseline.py to your Desktop folder by pressing File in the top menu and then selecting Save as... A window will be opened to let you select the folder and enter the filename. Then run the program by pressing the Run button under the top menu. A game board will be generated (If the game board does not appear, click its icon on the Windows taskbar. Alternatively, avoid maximizing the Spyder window; instead, resize it and drag the game board to a corner so that both windows do not overlap.).

3. Once the Run button is pressed, the game will run and the agent will make the first movement. The current state after the movement and the reward obtained for the movement will be shown in the Console window. It will also ask and wait for the user to press Enter to continue. Pressing Enter in the Console window will let the game continue and the agent will make another move. You will find that sometimes the agent does not move after you press Enter. It is because the system asks the agent to move out of the boundaries of the game. In this case, the agent will stay in the original state. (The program is not hanging; it completes quickly and then waits for your next input. Please consider moving the game board outside the main screen so you can observe the rapid movements.)

4. Play the game until the agent reaches a Hole such as the following:

Then capture the screen of Spyder at that time (you can first click the Spyder window and press the +Shift+S buttons together on your keyboard; Alternatively, type "Snipping" in the Windows search bar to open the software.). Select the region.Then paste the screen capture by pressing ctrl-v in the box below. Your screen capture needs to show the current state and reward values in the Console window.

Question: What are the current state and reward values as shown in the Console window? Compared with the game board screen, does it match the expected result? Comment on whether the current position of the agent matches.

Part III: Training the agent

1. You will find that the agent can never reach the goal no matter how many times you play since the agent has not been trained. Its movement is randomly generated by the program. We will start training the agent. Open a new file by clicking File in the top menu and then click New File….. A new window will be opened in the Editor window. Copy and paste the following Python program codes into it. Avoid making any changes, such as adding spaces or tabs.

#####################################################

#     EIE1005 - Workshop

# Reinforcement Learning for Game AI

#

#####################################################

#  Training Program

#####################################################

import gymnasium as gym

import numpy as np

# 1. Load Environment

env = gym.make('FrozenLake-v1', is_slippery=False)

epis = int(input("Enter the number of games to play for training: "))

if epis <= 10:

env = gym.make('FrozenLake-v1', render_mode='human', is_slippery=False)

else:

env = gym.make('FrozenLake-v1', render_mode='rgb_array', is_slippery=False)

# 2. Parameters of Q-learning

# Construct the Q-table

Q = np.zeros([env.observation_space.n,env.action_space.n])

# observation.n and action_space.n give no. of states and actions

eta = .628

gma = .9

rev_list = [] # rewards per episode calculate

# 3. Q-learning Algorithm

for i in range(epis):

init_state = env.reset() # Reset environment

s = init_state[0]

rAll = 0

d = False

j = 0

#The Q-Table learning algorithm

while j < 99:

j+=1

# Choose action from Q table

a = np.argmax(Q[s,:] + np.random.randn(1,env.action_space.n)*(1./(i+1)))

#Get new state & reward from environment

state, reward, terminated, truncated, info = env.step(a)

#Update Q-Table with new knowledge

Q[s,a] = Q[s,a] + eta*(reward + gma*np.max(Q[state,:]) - Q[s,a])

env.render()

print("Current state: ", state, end = "   ")

print("Reward = ", reward)

#input("Press Enter to continue:")

print()

rAll += reward

s = state

if terminated or truncated == True:

break

rev_list.append(rAll)

env.render()

print("Training Episode = ", i+1)

print("Reward Sum on all episodes " + str(sum(rev_list)/epis))

print("Final Values Q-Table")

print(Q)

2. Save the program with the filename FrozenLake_training.py to your Desktop folder.

3. In the program, the agent is trained using the Q-learning method discussed in class. For every move of the agent, the Q-table kept in the program is updated using the Bellman equation. In the program, the Q-table is implemented as a two-dimensional array that contains 16 rows and 4 columns. Each row represents a state and the row number represents the state number. Each column represents an action and the column number represents the action number. Recall that the agent at a state will act according to the action with the largest value in the Q-table in that state. So, for example, the following Q-table shows that if the agent is at state 0 (1st row), it will move right (action 2) since column 2 (column number starts from 0) has the largest value (0.59049) in that row. And if the agent is at state 2 (row 2), it will move down (action 1) since column 1 has the largest value (0.729) in that row. The action number is encoded as follows:

0: LEFT

1: DOWN

2: RIGHT

3: UP

The values of a Q-table:

[[0.      0.      0.59049 0.     ]

[0.      0.      0.6561  0.     ]

[0.      0.729   0.      0.     ]

[0.      0.      0.      0.     ]

[0.      0.      0.      0.     ]

[0.      0.      0.      0.     ]

[0.      0.81    0.      0.     ]

[0.      0.      0.      0.     ]

[0.      0.      0.      0.     ]

[0.      0.      0.      0.     ]

[0.      0.9     0.      0.     ]

[0.      0.      0.      0.     ]

[0.      0.      0.      0.     ]

[0.      0.      0.      0.     ]

[0.      0.      1.      0.     ]

[0.      0.      0.      0.     ]]

4. The training is now ready to proceed. Press the Run button under the top menu. A game board will be generated (if the game board is not popped up, click its icon on the taskbar of Windows). The program first asks in the Console window how many games you want to play for training the agent as follows:

Enter the number of games to play for training:

Let’s enter 10 to let the agent be trained by playing 10 games. In each game, the agent goes from the starting point to the Goal or a Hole. See how the agent moves on the game board. After the game finishes playing, copy the final values of the Q-table as shown in the Console window and paste them in the box below (you may find that they are all zeros):

5. There is a good chance that the agent still cannot reach the goal during the whole training (that is why the Q-table does not have too many values). It is because we have not played enough games to train the agent sufficiently. Now, we change to play the game 500 times. This time, the graphics will not be shown to speed up the operation (actually, if you train by playing more than 10 games, the graphics will be shown).

Now, repeat step 4 and enter 500 when you are asked to enter the number of games to play. You will find that the training is completed almost immediately. Paste the resulting Q-table in the box below:

Question: If the agent follows your Q-table to find the path, draw on the diagram below the path the agent will use to reach the goal (you may add some arrows in the diagram below to indicate the path). You may need to refer to your course notes to understand how the Q-table is read. (You may also refer to the Q-table example in (4) to understand the meaning of the values.)

6. Now we are going to evaluate the performance of the trained Q-table. First, save your program to another file FrozenLake_train_n_play.py in your Desktop folder.

7. Then, append the following codes to the end of the training program (that is, paste the codes after the last statement of the original program).

#####################################################

#  Testing Program

#####################################################

env = gym.make('FrozenLake-v1', render_mode = 'human', is_slippery=False)

epis = 10

goal_step = 0

goal = 0

for i in range(epis):

# Reset environment

init_state = env.reset()

s = init_state[0]

j = 0

print("\nPlaying game ", i+1)

#Use the trained Q-Table to determine the action

while j < 99:

j+=1

# Choose the action from the Q table

a = np.argmax(Q[s,:])

#Get new state & reward from environment

state, reward, terminated, truncated, info = env.step(a)

s = state

if terminated or truncated == True:

break

env.render()

if state == 15:

goal += 1

goal_step += j

print("\nNumber of times reaching the goal: ", goal)

if goal == 0:

print("Average number of steps to reach the goal: infinity")

else:

print("Average number of steps to reach the goal: ", goal_step/goal)

This program makes use of the Q-table trained in the first part of the program to inform. the agent of the path to reach the Goal. By counting the number of times the agent can successfully reach the Goal, we understand how well the Q-table is trained.

8. Run the program by pressing the Run button under the top menu. Also, enter 500 games to play for training the agent. The program will start to train the agent as in part (5). Then, the game board will be launched and show how the agent moves according to the trained Q-table. The game will be played ten times to test the performance of the trained Q-table. You will find that the agent can always reach the Goal each time. Copy and paste in the box below the resulting Q-table, the number of times the agent reaches the Goal, and the average number of steps the agent used to reach the Goal, as shown in the Console window.

Part IV: To introduce randomness

1. Since the game environment is always the same, the agent must be able to reach the Goal once the Q-table is well-trained. To have more fun, let us introduce some randomness to the game to increase its difficulty. First, download the program FrozenLake_random.py from the Blackboard to your Desktop folder. Click File and Open… in Spyder to open the file. Run the program by pressing the Run button under the top menu. The program first asks in the Console window how many games you want to play for training the agent as follows:

Enter the number of games to play for training:

Let’s enter 500 to let the agent be trained by playing 500 games.

Then, the program asks

Enter the number of games to play for testing:

Enter 10 to let the game be played 10 times for testing. Finally, the program asks if you want to play in random mode. Enter Y to introduce randomness to the game. The game will start to train the agent by playing for 500 games and test the Q-table by playing for 10 games.

2. When random play is set to Y, the agent will not follow exactly the action given by the program during training and playing. It will slip in one of the three directions in equal probability (since the frozen lake is slippery). So there is only a 1/3 probability that the agent will move in the direction given by the action. Click the game board to see how difficult the agent gets to the Goal.

After the game finishes (it will take some time), copy and paste in the box below the resulting Q-table, the number of times the agent reaches the Goal, and the average number of steps the agent used to reach the Goal, as shown in the Console window.

3. In fact, the performance evaluation in (2) is not very accurate since the agent only plays 10 games in the testing program. The statistic is not reliable. The agent should play more games in the testing program. Run the program again. This time enter 500 games for training and 100 games for testing. The agent will then play 100 games in the testing phase. Note that the game board will not be shown if you play more than 10 games to speed up the testing process.

4. Run the program 10 times (press the Run button under the top menu each time you run the program). Record down in the table below, for each time you run the program, the number of times (T) the agent reaches the Goal, and the average number of steps (S) the agent used to reach the Goal (round to integer), as shown in the Console window.

n

1

2

3

4

5

6

7

8

9

10

T

S

S should be rounded to integers. If T is 0, no need to record the value of S. You may find that the variation is quite large among different runs. It is better to report the results based on their mean value and standard deviation.

Question: What are the mean and standard deviation of T?


热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图