Neural Fictitious Self-Play in Practice

栏目: IT技术 · 发布时间: 6年前

内容简介:This article describes an implementation of Neural Fictitious Self-Play (NFSP) in Leduc Hold’em Poker Game, based on code byDisclaimer: this article does not aim to explain every bit of detail in the implementation code, but rather to highlight the main ax

This article describes an implementation of Neural Fictitious Self-Play (NFSP) in Leduc Hold’em Poker Game, based on code by Eric Steinberger . The full source code can be found on his Github repository .

If you are new to the topic it is better to start with these articles first:

Introduction to Fictitious Play

Fictitious Self Play

Neural Fictitious Self-Play

Disclaimer: this article does not aim to explain every bit of detail in the implementation code, but rather to highlight the main axis and the mapping between the implementation of the code and the academic solution.

The implementation involves distributed computation which adds a level of complexity to the code. However in this article, we will focus on the algorithm per se, and we will bypass the distributed computation aspect.

For this purpose, we will do the parallel with the academic algorithm below.

Neural Fictitious Self-Play in Practice

NFSP Algorithm from Heinrich/Silver paper

Leduc Hold’em

First, let’s define Leduc Hold’em game.

Here is a definition taken from DeepStack-Leduc . It reads:

Leduc Hold’em is a toy poker game sometimes used in academic research (first introduced in Bayes’ Bluff: Opponent Modeling in Poker ). It is played with a deck of six cards, comprising two suits of three ranks each (often the king, queen, and jack — in our implementation, the ace, king, and queen). The game begins with each player being dealt one card privately, followed by a betting round. Then, another card is dealt faceup as a community (or board) card, and there is another betting round. Finally, the players reveal their private cards. If one player’s private card is the same rank as the board card, he or she wins the game; otherwise, the player whose private card has the higher rank wins.

Global View

The main class is workers\driver\Driver.py which has a method run() that sets everything in motion.

It sets the main loop and the execution of the algorithm at each iteration, as seen in the following image.

Neural Fictitious Self-Play in Practice

Driver.py

The Algorithm

The bulk of the action happens in the _HighLevelAlgo.py where it is easy to distinguish the different parts of the academic solution.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

分享经济的爆发

分享经济的爆发

阿鲁·萨丹拉彻 / 周恂 / 文汇出版社 / 2017-4-1 / 59.00元

◆了解分享经济,读这本就够了!解读了全球几乎所有成功的分享经济案例。 ◆国家多次提出“发展分享经济”“分享经济是经济新常态的国家战略”。 ◆全球分享经济泰斗揭示分享经济将从哪些方面重构我们的生活。 ◆作者是分享经济领域的泰斗,纽约大学斯特恩商学院教授。 ◆全球分享经济理论热门著作! ◆滴滴CEO程维亲自作序力荐! ◆谷歌、《时代周刊》、《 纽约时报》、《华尔街日......一起来看看 《分享经济的爆发》 这本书的介绍吧!

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具