An Introduction to Unity ML-Agents

栏目: IT技术 · 发布时间: 4年前

An Introduction to Unity ML-Agents

Train a reinforcement learning agent to jump over walls.

An Introduction to Unity ML-Agents

This article is part of a new free course on Deep Reinforcement Learning with Unity. Where we’ll create agents with TensorFlow that learn to play video games using the Unity game engine . Check the syllabus here .

If you never study Deep Reinforcement Learning before, you need to check the free course Deep Reinforcement Learning with Tensorflow.

The past few years have witnessed breakthroughs in reinforcement learning (RL). From the first successful use of RL by a deep learning model for learning a policy from pixel input in 2013 to the OpenAI Dexterity program in 2019, we live in an exciting moment in RL research.

Consequently, we need, as RL researchers, to create more and more complex environments and Unity helps us to do that. Unity ML-Agents toolkit is a new plugin based on the game engine Unity that allows us to use the Unity Game Engine as an environment builder to train agents.

An Introduction to Unity ML-Agents

Source: Unity ML-Agents Toolkit Github Repository

From playing football, learning to walk, to jump big walls, to train a cute doggy to catch sticks, Unity ML-Agents Toolkit provides a ton of amazing pre-made environment.

Furthermore, during this free course, we will also create new learning environments.

For now, we’ll learn how Unity ML-Agents works and at the end of the article, you’ll train an agent to learn jumping over walls.

An Introduction to Unity ML-Agents

But first, there are some requirements:

  • This is not a reinforcement learning introductory course , if you don’t have skills in Deep Reinforcement Learning before, you need to check the free course Deep Reinforcement Learning with Tensorflow.
  • Moreover, this is not a course about Unity, so you need to have some Unity basic skills. if it’s not the case you should definitely check out their amazing course for beginners: Create with Code.

So let’s get started!

How Unity ML-Agents works?

What’s Unity ML-Agents?

Unity ML-Agents is a new plugin for the game engine Unity that allows us to create or use pre-made environments to train our agents.

It’s developed by Unity Technologies, the developers of Unity , one of the best game engine ever. This is used by the creators of Firewatch, Gone Home, Cuphead and also a lot of AAA games.

An Introduction to Unity ML-Agents

Firewatch was made with Unity

The three components

With Unity ML-Agents, you have three important components.

An Introduction to Unity ML-Agents

Source: Unity ML-Agents Documentation

The first is the Learning Component (on Unity), that contains the Unity scene and the environment elements.

The second is the Python API that contains the RL algorithms (such as PPO and SAC). We use this API to launch training, to test, etc. It communicates with the Learning environment through the external communicator.

Inside the Learning Component

Inside the Learning Component, we have different elements:

An Introduction to Unity ML-Agents

Source: Unity ML-Agents Documentation

The first is the Agent, the actor of the scene. He’s him that we’re going to train by optimizing his policy (that will tell us what action to take at each state) called Brain.

Finally, there is the Academy, this element orchestrates agents and their decision-making process. Think of this Academy as a maestro that handles the requests from the python API.

To better understand its role let’s remember the RL process. This can be modeled as a loop that works like this:

An Introduction to Unity ML-Agents

Source: Sutton’s Book

Now, let’s imagine an agent learning to play a platform game. The RL process looks like this:

  • Our agent receives state S0 from the environment — we receive the first frame of our game (environment).
  • Based on the state S0, the agent takes an action A0 — our agent will move to the right.
  • The environment transitions to a new state S1.
  • Give a reward R1 to the agent — we’re not dead (Positive Reward +1) .

This RL loop outputs a sequence of state, action, and reward. The goal of the agent is to maximize the expected cumulative reward.

An Introduction to Unity ML-Agents

In fact, the Academy will be the one that will send the order to our Agents and ensure that agents are in sync :

  • Collect Observations
  • Select your action using your policy
  • Take the Action
  • Reset if you reached the max step or if you’re done.

Train an agent to jump walls

So now that we understand how Unity ML-Agents works, let’s train an agent to jump off the wall.

We published our trained models on github, you can download them here.

The Wall Jump Environment

The goal in this environment is to train our agent to go on the green tile.

However, there are 3 situations:

  • The first, you have no walls, our agent just needs to go on the green tile.

An Introduction to Unity ML-Agents

No Wall situation
  • In the second situation, the agent needs to learn to jump in order to reach the green tile.

An Introduction to Unity ML-Agents

Small Wall Situation
  • Finally, in the hardest situation, our agent will not be able to jump as high as the wall is so he needs to push the white block in order to jump on it to be able to jump over the wall.

An Introduction to Unity ML-Agents

Big Wall Situation

We’ll learn two different policiesdepending on the height of the wall:

  • The first SmallWallJump will be learned during the no wall and low wall situations.
  • The second, BigWallJump , will be learned during the high wall situations.

The reward system is:

An Introduction to Unity ML-Agents

In terms of observation, we don’t use normal vision (frame) but 14 raycasts that can each detect 4 possible objects. Think of raycasts as lasers that will detect if it passes through object.

We also use the global position of the agent and whether or not is grounded.

An Introduction to Unity ML-Agents

Source: Unity ML-Agents Documentation

The action space is discrete with 4 branches:

An Introduction to Unity ML-Agents

Our goal is to hit the benchmark with a mean reward of 0.8.

Let’s jump!

First of all, let’s open the UnitySDK project.

In the examples search for WallJump and open the scene.

You see in the scene, a lot of Agents, each of them comes from the same Prefab and they all share the same Brain.

An Introduction to Unity ML-Agents

Multiple copies of the same Agent Prefab.

In fact, as we do in classical Deep Reinforcement Learning when we launched multiple instances of a game (for instance 128 parallel environments) we do the same hereby copy and paste the agents, in order to have more various states.

So, first, because we want to train our agent from scratch, we need to remove the brains from the agent. We need to go to the prefabs folder and open the Prefab.

Now in the Prefab hierarchy, select the Agent and go into the inspector.

First, in Behavior Parameters, we need to remove the Model. If you have some GPUs you can change Inference Device from CPU to GPU.

An Introduction to Unity ML-Agents

Then in Wall Jump Agent Component, we need to remove the brains for No Wall Brain, Small Wall Brain, and Big Wall Brain situations.

An Introduction to Unity ML-Agents

Now that you’ve done that you’re ready to train your agent from scratch.

For this first training, we’ll just modify the total training steps for the two policies (SmallWallJump and BigWallJump) because we can hit the benchmark in only 300k training steps.

To do that we go to and you modify these to max_steps to 3e5 for SmallWallJump and BigWallJump situations in config/trainer_config.yaml

An Introduction to Unity ML-Agents

To train this agent, we will use PPO (Proximal Policy Optimization) if you don’t know about it or you need to refresh your knowledge, check my article.

We saw that to train this agent, we need to call our External Communicator using the Python API. This External Communicator will then ask the Academy to start the agents.

So, you need to open your terminal, go where ml-agents-master is and type this:

mlagents-learn config/trainer_config.yaml — run-id=”WallJump_FirstTrain” — train

It will ask you to run the Unity scene,

Press the :arrow_forward: button at the top of the Editor.

An Introduction to Unity ML-Agents

You can monitor your training by launching Tensorboard using this command:

tensorboard — logdir=summaries

Watching your agent jumping over walls

You can watch your agent during the training by looking at the game window.

When the training is finished you need to move the saved model files contained in ml-agents-master/models to UnitySDK/Assets/ML-Agents/Examples/WallJump/TFModels .

And again, open the Unity Editor, and select WallJump scene .

Select the WallJumpArea prefab object and open it.

Select Agent .

In Agent Behavior Parameters , drag the SmallWallJump.nn file to Model Placeholder.

An Introduction to Unity ML-Agents

Drag the SmallWallJump.nn file to No Wall Brain Placeholder.

Drag the SmallWallJump.n n file to Small Wall Brain Placeholder.

Drag the BigWallJump.nn file to No Wall Brain Placeholder.

An Introduction to Unity ML-Agents

Then, press the :arrow_forward: button at the top of the Editor and voila!

An Introduction to Unity ML-Agents

:tada:

Time for some experiments

We’ve just trained our agents to learn to jump over walls. Now that we have good results we can make some experiments.

Remember that the best way to learn is to be active by experimenting. So you should try to make some hypotheses and verify them.

Reducing the discount rate to 0.95

We know that:

  • The larger the gamma, the smaller the discount. This means the learning agent cares more about the long term reward.
  • On the other hand, the smaller the gamma, the bigger the discount. This means our agent cares more about the short term reward.

The idea behind this experimentation was if we increase the discount by decreasing the gamma from 0.99 to 0.95, our agent will care more about the short term reward, and maybe it will help him to converge faster to an optimal policy.

An Introduction to Unity ML-Agents

Something interesting to see is that our agent performs quite the same in case of Small Wall Jump, we can explain that with the fact that this situation is quite easy, he just needs to move towards the green grid and jump if there is a small wall.

An Introduction to Unity ML-Agents

On the other hand, it performed really badly in the case of Big Wall Jump. We can explain that because our new agent cares more on short term reward, he was unable to have a long term thinking and thus didn’t really understood that he needed to push the white brick in order to be able to jump over the wall.

Increasing the complexity of the Neural Network

For this third and last training, the hypothesis was does our agent will become smarter if we increase the Network complexity?

What we’ve done is increasing the size of the hidden unit from 256 to 512.

But we found that this new agent performs poorer than our first agent.

It implies that we don’t need to increase the complexity of our network when we have this type of simple problem because it increases the training time until convergence.

An Introduction to Unity ML-Agents

That’s all for today! You’ve just trained an agent that learns to jump over walls. Awesome!

Don’t forget to experiment, change some hyperparameters try news things. Have fun!

If you want to compare with our experimentations, we published our trained models here.

In the next article, we’ll train a smarter agent that needs to press a button to spawn a pyramid, then navigate to the pyramid, knock it over, and move to the gold brick at the top. In order to do that we’ll use in addition to extrinsic rewards curiosity as an intrinsic one.

An Introduction to Unity ML-Agents

If you have any thoughts, comments, questions, feel free to comment below or send me an email: hello@simoninithomas.com, or tweet me @ThomasSimonini .

Keep learning, stay awesome!


以上所述就是小编给大家介绍的《An Introduction to Unity ML-Agents》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

基业长青

基业长青

[美] 詹姆斯·柯林斯、[美] 杰里·波勒斯 / 真如 / 中信出版社 / 2006-9 / 39.00元

如何建立一个伟大并长盛不衰的公司?有思想的人们早已经厌倦了“年度流行语”般稍纵即逝的管理概念,他们渴求获得能经受时间考验的管理思想。 柯林斯和波勒斯在斯坦福大学为期6年的研究项目中,选取了18个卓越非凡、长盛不衰的公司作了深入的研究,这些公司包括通用电气、3M、默克、沃尔玛、惠普、迪士尼等,它们平均拥有近百年的历史。是什么使这些公司不同于它们的竞争对手呢?他们拥有什么别的公司所不具有的法宝呢......一起来看看 《基业长青》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

XML 在线格式化
XML 在线格式化

在线 XML 格式化压缩工具