An Introduction to Unity ML-Agents

栏目: IT技术 · 发布时间: 4年前

An Introduction to Unity ML-Agents

Train a reinforcement learning agent to jump over walls.

Jan 30 ·10min read

This article is part of a new free course on Deep Reinforcement Learning with Unity. Where we’ll create agents with TensorFlow that learn to play video games using the Unity game engine . Check the syllabus here .

If you never study Deep Reinforcement Learning before, you need to check the free course Deep Reinforcement Learning with Tensorflow.

The past few years have witnessed breakthroughs in reinforcement learning (RL). From the first successful use of RL by a deep learning model for learning a policy from pixel input in 2013 to the OpenAI Dexterity program in 2019, we live in an exciting moment in RL research.

Consequently, we need, as RL researchers, to create more and more complex environments and Unity helps us to do that. Unity ML-Agents toolkit is a new plugin based on the game engine Unity that allows us to use the Unity Game Engine as an environment builder to train agents.

From playing football, learning to walk, to jump big walls, to train a cute doggy to catch sticks, Unity ML-Agents Toolkit provides a ton of amazing pre-made environment.

Furthermore, during this free course, we will also create new learning environments.

For now, we’ll learn how Unity ML-Agents works and at the end of the article, you’ll train an agent to learn jumping over walls.

But first, there are some requirements:

This is not a reinforcement learning introductory course , if you don’t have skills in Deep Reinforcement Learning before, you need to check the free course Deep Reinforcement Learning with Tensorflow.
Moreover, this is not a course about Unity, so you need to have some Unity basic skills. if it’s not the case you should definitely check out their amazing course for beginners: Create with Code.

So let’s get started!

How Unity ML-Agents works?

What’s Unity ML-Agents?

Unity ML-Agents is a new plugin for the game engine Unity that allows us to create or use pre-made environments to train our agents.

It’s developed by Unity Technologies, the developers of Unity , one of the best game engine ever. This is used by the creators of Firewatch, Gone Home, Cuphead and also a lot of AAA games.

The three components

With Unity ML-Agents, you have three important components.

The first is the Learning Component (on Unity), that contains the Unity scene and the environment elements.

The second is the Python API that contains the RL algorithms (such as PPO and SAC). We use this API to launch training, to test, etc. It communicates with the Learning environment through the external communicator.

Inside the Learning Component

Inside the Learning Component, we have different elements:

The first is the Agent, the actor of the scene. He’s him that we’re going to train by optimizing his policy (that will tell us what action to take at each state) called Brain.

Finally, there is the Academy, this element orchestrates agents and their decision-making process. Think of this Academy as a maestro that handles the requests from the python API.

To better understand its role let’s remember the RL process. This can be modeled as a loop that works like this:

Source: Sutton’s Book

Now, let’s imagine an agent learning to play a platform game. The RL process looks like this:

Our agent receives state S0 from the environment — we receive the first frame of our game (environment).
Based on the state S0, the agent takes an action A0 — our agent will move to the right.
The environment transitions to a new state S1.
Give a reward R1 to the agent — we’re not dead (Positive Reward +1) .

This RL loop outputs a sequence of state, action, and reward. The goal of the agent is to maximize the expected cumulative reward.

In fact, the Academy will be the one that will send the order to our Agents and ensure that agents are in sync :

Collect Observations
Select your action using your policy
Take the Action
Reset if you reached the max step or if you’re done.

Train an agent to jump walls

So now that we understand how Unity ML-Agents works, let’s train an agent to jump off the wall.

We published our trained models on github, you can download them here.

The Wall Jump Environment

The goal in this environment is to train our agent to go on the green tile.

However, there are 3 situations:

The first, you have no walls, our agent just needs to go on the green tile.

In the second situation, the agent needs to learn to jump in order to reach the green tile.

Finally, in the hardest situation, our agent will not be able to jump as high as the wall is so he needs to push the white block in order to jump on it to be able to jump over the wall.

We’ll learn two different policiesdepending on the height of the wall:

The first SmallWallJump will be learned during the no wall and low wall situations.
The second, BigWallJump , will be learned during the high wall situations.

The reward system is:

In terms of observation, we don’t use normal vision (frame) but 14 raycasts that can each detect 4 possible objects. Think of raycasts as lasers that will detect if it passes through object.

We also use the global position of the agent and whether or not is grounded.

The action space is discrete with 4 branches:

Our goal is to hit the benchmark with a mean reward of 0.8.

Let’s jump!

First of all, let’s open the UnitySDK project.

In the examples search for WallJump and open the scene.

You see in the scene, a lot of Agents, each of them comes from the same Prefab and they all share the same Brain.

In fact, as we do in classical Deep Reinforcement Learning when we launched multiple instances of a game (for instance 128 parallel environments) we do the same hereby copy and paste the agents, in order to have more various states.

So, first, because we want to train our agent from scratch, we need to remove the brains from the agent. We need to go to the prefabs folder and open the Prefab.

Now in the Prefab hierarchy, select the Agent and go into the inspector.

First, in Behavior Parameters, we need to remove the Model. If you have some GPUs you can change Inference Device from CPU to GPU.

Then in Wall Jump Agent Component, we need to remove the brains for No Wall Brain, Small Wall Brain, and Big Wall Brain situations.

Now that you’ve done that you’re ready to train your agent from scratch.

For this first training, we’ll just modify the total training steps for the two policies (SmallWallJump and BigWallJump) because we can hit the benchmark in only 300k training steps.

To do that we go to and you modify these to max_steps to 3e5 for SmallWallJump and BigWallJump situations in config/trainer_config.yaml

To train this agent, we will use PPO (Proximal Policy Optimization) if you don’t know about it or you need to refresh your knowledge, check my article.

We saw that to train this agent, we need to call our External Communicator using the Python API. This External Communicator will then ask the Academy to start the agents.

So, you need to open your terminal, go where ml-agents-master is and type this:

mlagents-learn config/trainer_config.yaml — run-id=”WallJump_FirstTrain” — train

It will ask you to run the Unity scene,

Press the :arrow_forward: button at the top of the Editor.

You can monitor your training by launching Tensorboard using this command:

tensorboard — logdir=summaries

Watching your agent jumping over walls

You can watch your agent during the training by looking at the game window.

When the training is finished you need to move the saved model files contained in ml-agents-master/models to UnitySDK/Assets/ML-Agents/Examples/WallJump/TFModels .

And again, open the Unity Editor, and select WallJump scene .

Select the WallJumpArea prefab object and open it.

Select Agent .

In Agent Behavior Parameters , drag the SmallWallJump.nn file to Model Placeholder.

Drag the SmallWallJump.nn file to No Wall Brain Placeholder.

Drag the SmallWallJump.n n file to Small Wall Brain Placeholder.

Drag the BigWallJump.nn file to No Wall Brain Placeholder.

Then, press the :arrow_forward: button at the top of the Editor and voila!

Time for some experiments

We’ve just trained our agents to learn to jump over walls. Now that we have good results we can make some experiments.

Remember that the best way to learn is to be active by experimenting. So you should try to make some hypotheses and verify them.

Reducing the discount rate to 0.95

We know that:

The larger the gamma, the smaller the discount. This means the learning agent cares more about the long term reward.
On the other hand, the smaller the gamma, the bigger the discount. This means our agent cares more about the short term reward.

The idea behind this experimentation was if we increase the discount by decreasing the gamma from 0.99 to 0.95, our agent will care more about the short term reward, and maybe it will help him to converge faster to an optimal policy.

Something interesting to see is that our agent performs quite the same in case of Small Wall Jump, we can explain that with the fact that this situation is quite easy, he just needs to move towards the green grid and jump if there is a small wall.

On the other hand, it performed really badly in the case of Big Wall Jump. We can explain that because our new agent cares more on short term reward, he was unable to have a long term thinking and thus didn’t really understood that he needed to push the white brick in order to be able to jump over the wall.

Increasing the complexity of the Neural Network

For this third and last training, the hypothesis was does our agent will become smarter if we increase the Network complexity?

What we’ve done is increasing the size of the hidden unit from 256 to 512.

But we found that this new agent performs poorer than our first agent.

It implies that we don’t need to increase the complexity of our network when we have this type of simple problem because it increases the training time until convergence.

That’s all for today! You’ve just trained an agent that learns to jump over walls. Awesome!

Don’t forget to experiment, change some hyperparameters try news things. Have fun!

If you want to compare with our experimentations, we published our trained models here.

In the next article, we’ll train a smarter agent that needs to press a button to spawn a pyramid, then navigate to the pyramid, knock it over, and move to the gold brick at the top. In order to do that we’ll use in addition to extrinsic rewards curiosity as an intrinsic one.

If you have any thoughts, comments, questions, feel free to comment below or send me an email: hello@simoninithomas.com, or tweet me @ThomasSimonini .

Keep learning, stay awesome!

以上所述就是小编给大家介绍的《An Introduction to Unity ML-Agents》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

An Introduction to Unity ML-Agents

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

基业长青

[美] 詹姆斯·柯林斯、[美] 杰里·波勒斯 / 真如 / 中信出版社 / 2006-9 / 39.00元

如何建立一个伟大并长盛不衰的公司？有思想的人们早已经厌倦了“年度流行语”般稍纵即逝的管理概念，他们渴求获得能经受时间考验的管理思想。柯林斯和波勒斯在斯坦福大学为期6年的研究项目中，选取了18个卓越非凡、长盛不衰的公司作了深入的研究，这些公司包括通用电气、3M、默克、沃尔玛、惠普、迪士尼等，它们平均拥有近百年的历史。是什么使这些公司不同于它们的竞争对手呢？他们拥有什么别的公司所不具有的法宝呢......一起来看看《基业长青》这本书的介绍吧!

码农工具

An Introduction to Unity ML-Agents