Google Open Sources Dreamer: A Reinforcement Learning Agent that can Solve Long-Horizon Tasks…

栏目: IT技术 · 发布时间: 4年前

内容简介:Deep reinforcement leaning(DRL) has been at the center of some of the most important artificial intelligence(AI) breakthroughs of the last decade. Given its dependency on interactions with an environment, DRL is regularly applied to many real world scenari

Deep reinforcement leaning(DRL) has been at the center of some of the most important artificial intelligence(AI) breakthroughs of the last decade. Given its dependency on interactions with an environment, DRL is regularly applied to many real world scenarios such as self-driving vehicles that operate in really complex environments. Those requirements are pushing DRL research in a direction of creating agents that can generalize knowledge of their environment without an extensive need of trial and error. DeepMind and Google recently open sourced Dreamer , a reinforcement learning agent that learns a world model from images and uses it to learn long-sighted behaviors.

The world of DRL can be divided into two main groups: model-free and model-based : model-free and model based. In its most basic form, model-free reinforcement learning models focus on mastering specific tasks by mapping rewards to a given action. This is typically known as model-free reinforcement learning and has been the foundation behind systems such as DeepMind’s DQN which mastered Atari games. Model-Free reinforcement learning typically requires a large number of simulated training sessions in order to map actions to sensory inputs which often results limited for long-term planning strategies.

Model-based reinforcement learning is the best-known alternative to model-free architectures and has been the foundation behind major breakthroughs in reinforcement learning such as Open AI Dota2 agents as well as DeepMind’s Quake III , AlphaGo or AlphaStar . Contrasting with model-free approaches, model-based reinforcement learning attempts to have agents learn how the world behaves in general and select actions based on long-term outcomes. This type of knowledge generation is known as “world models” and are a fundamental element of model-based DRL. Not surprisingly, model-based reinforcement learning agents have proven more efficient in longer-term planning as those required in multi-player strategy games.

Google Open Sources Dreamer: A Reinforcement Learning Agent that can Solve Long-Horizon Tasks…

One of the main challenges for the mainstream adoption of model-based DRL has been their ability of generalize long-term tasks. In real world scenarios, DRL agents will regularly interact with complex environment and faced situations that they’ve never seen before. This characteristic ability requires building representations of the world from past experience that enable generalization to novel situations. Although there has been some notable progress in this area, the challenge of long-term planning in model-based DRL remains incredibly expensive from a computational standpoint.

Enter Dreamer

Google’s Dreamer is a DRL agent that can learn long-horizon behaviors in a given environment. One of the main innovations of Dreamer is that the agent is able to learns a world model from images and uses it to learn long-sighted behaviors. From there, Dreamer leverages its world model to efficiently learn behaviors via backpropagation through model predictions. I know this all sounds a bit surreal so let’s try to deep into the details.

From an architecture standpoint, Dreamer is no different than other model-based DRL methods. Functionally, the Dreamer architecture is based in three fundamental steps. In the first step, the model tries to infer a world model by learning from a dataset of past experience, the agent learns to encode observations and actions into compact latent states. The second step focuses on learning value and actor networks. In this step, Dreamer predicts state values and actions that maximize future value predictions by propagating gradients back through imagined trajectories. Finally, the third step of Dreamer enables environment interactions. In this step, the agent encodes the history of the episode to compute the current model state and predict the next action to execute in the environment.

Google Open Sources Dreamer: A Reinforcement Learning Agent that can Solve Long-Horizon Tasks…

Source: https://ai.googleblog.com/2020/03/introducing-dreamer-scalable.html

Let’s deep dive into some of those steps.

Step 1: Learning the World Model

To build a accurate world models, Dreamer leverages another innovative project from Google and DeepMind. Google’s Deep Planning Network(PlaNet) is a purely model-based reinforcement learning algorithm that solves control tasks from images by efficient planning in a learned latent space. In other words, PlaNet learns about an environment using images and uses that knowledge for log-term planning in image control tasks. To efficiently plan long-term tasks using images, PlaNet introduces the notion of a latent dynamics model which is a compact representation of “latent states” in an image which describe representations such as velocity or positions of objects. Instead of prediction the next image from a given image like other image-based planning models, PlaNet predicts the next latent state and that information is used to predict future images.

Dreamer leverages PlanNet to predict outcomes based on a sequence of compact model states that are computed from the input images, instead of directly predicting from one image to the next. It automatically learns to produce model states that represent concepts helpful for predicting future outcomes, such as object types, positions of objects, and the interaction of the objects with their surroundings. Given a sequence of images, actions, and rewards from the agent’s dataset of past experience, Dreamer learns the world model.

Google Open Sources Dreamer: A Reinforcement Learning Agent that can Solve Long-Horizon Tasks…

Source: https://ai.googleblog.com/2020/03/introducing-dreamer-scalable.html

One of the benefits of using PlaNet is computational efficiency. Dreamer is able to predict thousands of images using a single GPU which facilitates generalization.

Google Open Sources Dreamer: A Reinforcement Learning Agent that can Solve Long-Horizon Tasks…

Source: https://ai.googleblog.com/2020/03/introducing-dreamer-scalable.html

Step 2: Behavioral Learning

One of the challenges of model-based DRL is to predict long-term outcomes without incurring in heavy computational costs. Dreamer overcomes this challenge by by learning a value network and an actor network via backpropagation through predictions of its world model. The agent efficiently learns the actor network to predict successful actions by propagating gradients of rewards backwards through predicted state sequences. This allows Dreamer how small changes to its actions affect what rewards are predicted in the future, allowing it to refine the actor network in the direction that increases the rewards the most.

Google Open Sources Dreamer: A Reinforcement Learning Agent that can Solve Long-Horizon Tasks…

Source: https://ai.googleblog.com/2020/03/introducing-dreamer-scalable.html

Step 3: Environment Interaction

Dreamer was evaluated using a benchmark of different tasks with continuous actions. The benchmark included diverse challenges such as difficult to predict collisions, sparse rewards, chaotic dynamics, small but relevant objects, high degrees of freedom, and 3D perspectives.

The results of the benchmark were compared against other state of the art model-based DRL models including PlaNet. Dreamer outperformed all the alternatives while also achieving relevant performance using fewer environment interactions.

Google Open Sources Dreamer: A Reinforcement Learning Agent that can Solve Long-Horizon Tasks…

Source: https://ai.googleblog.com/2020/03/introducing-dreamer-scalable.html

Dreamer is an very intriguing project that provides some perspective about how model-based DRL agents can master long-term tasks. The new agent provides tangible improvements over competitors and showed relevant performance mastering control tasks from image inputs. Google and DeepMind open sourced the initial implementation of Dreamer in GitHub .


以上所述就是小编给大家介绍的《Google Open Sources Dreamer: A Reinforcement Learning Agent that can Solve Long-Horizon Tasks…》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

互联网创业核心技术:构建可伸缩的web应用

互联网创业核心技术:构建可伸缩的web应用

【美】Artur Ejsmont / 李智慧、何坤 / 电子工业出版社 / 2016-12 / 89

可伸缩架构技术是所有互联网技术中最重要,也是最引人入胜的技术。《互联网创业核心技术:构建可伸缩的web应用》针对互联网创业需求快速迭代,业务快速发展,短时间内用户、数据、访问量激增的特点,提纲挈领地描述了伸缩性架构的基本原理与设计原则,详细阐述了Web应用前端层、服务层、数据层的可伸缩架构,并花大量篇幅讲述了缓存技术和异步处理技术的可伸缩设计及其在Web系统中的具体应用。 《互联网创业核心技......一起来看看 《互联网创业核心技术:构建可伸缩的web应用》 这本书的介绍吧!

MD5 加密
MD5 加密

MD5 加密工具

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器

RGB HSV 转换
RGB HSV 转换

RGB HSV 互转工具