Neural Fictitious Self-Play in Practice

栏目: IT技术 · 发布时间: 5年前

内容简介:This article describes an implementation of Neural Fictitious Self-Play (NFSP) in Leduc Hold’em Poker Game, based on code byDisclaimer: this article does not aim to explain every bit of detail in the implementation code, but rather to highlight the main ax

This article describes an implementation of Neural Fictitious Self-Play (NFSP) in Leduc Hold’em Poker Game, based on code by Eric Steinberger . The full source code can be found on his Github repository .

If you are new to the topic it is better to start with these articles first:

Introduction to Fictitious Play

Fictitious Self Play

Neural Fictitious Self-Play

Disclaimer: this article does not aim to explain every bit of detail in the implementation code, but rather to highlight the main axis and the mapping between the implementation of the code and the academic solution.

The implementation involves distributed computation which adds a level of complexity to the code. However in this article, we will focus on the algorithm per se, and we will bypass the distributed computation aspect.

For this purpose, we will do the parallel with the academic algorithm below.

Neural Fictitious Self-Play in Practice

NFSP Algorithm from Heinrich/Silver paper

Leduc Hold’em

First, let’s define Leduc Hold’em game.

Here is a definition taken from DeepStack-Leduc . It reads:

Leduc Hold’em is a toy poker game sometimes used in academic research (first introduced in Bayes’ Bluff: Opponent Modeling in Poker ). It is played with a deck of six cards, comprising two suits of three ranks each (often the king, queen, and jack — in our implementation, the ace, king, and queen). The game begins with each player being dealt one card privately, followed by a betting round. Then, another card is dealt faceup as a community (or board) card, and there is another betting round. Finally, the players reveal their private cards. If one player’s private card is the same rank as the board card, he or she wins the game; otherwise, the player whose private card has the higher rank wins.

Global View

The main class is workers\driver\Driver.py which has a method run() that sets everything in motion.

It sets the main loop and the execution of the algorithm at each iteration, as seen in the following image.

Neural Fictitious Self-Play in Practice

Driver.py

The Algorithm

The bulk of the action happens in the _HighLevelAlgo.py where it is easy to distinguish the different parts of the academic solution.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

游戏引擎架构

游戏引擎架构

[美] Jason Gregory (杰森.格雷戈瑞) / 叶劲峰 / 电子工业出版社 / 2014-1 / 128.00元

《游戏引擎架构》同时涵盖游戏引擎软件开发的理论及实践,并对多方面的题目进行探讨。本书讨论到的概念及技巧实际应用于现实中的游戏工作室,如艺电及顽皮狗。虽然书中采用的例子通常依据一些专门的技术,但是讨论范围远超于某个引擎或API。文中的参考及引用也非常有用,可让读者继续深入游戏开发过程的任何特定方向。 《游戏引擎架构》为一个大学程度的游戏编程课程而编写,但也适合软件工程师、业余爱好者、自学游戏程......一起来看看 《游戏引擎架构》 这本书的介绍吧!

JS 压缩/解压工具
JS 压缩/解压工具

在线压缩/解压 JS 代码

SHA 加密
SHA 加密

SHA 加密工具

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换