FPGA: Why So Few Open Source Drivers for Open Hardware?

栏目: IT技术 · 发布时间: 3年前

内容简介:Field-Programmable Gate Arrays (FPGA) have been an interest of mine for well over a decade now. Being able to generate complex signals in the tens of MHz range with nanosecond accuracy, dealing with fast data streams, and doing all of this at a fraction of

Field-Programmable Gate Arrays (FPGA) have been an interest of mine for well over a decade now. Being able to generate complex signals in the tens of MHz range with nanosecond accuracy, dealing with fast data streams, and doing all of this at a fraction of the power consumption of fast CPUs, they really have a lot of potential for fun. However, their prohibitive cost, proprietary toolchains (some running only on Windows), and the insanely-long bitstream generation made them look more like a curiosity to me rather than a practical solution. Finally, writing verilog / VHDL directly felt like the equivalent of writing an OS in assembly and thus felt more like torture than fun for the young C/C++ developer that I was. Little did I know that 10+ years later, I would find HW development to be the most amazing thing ever!

The first thing that changed is that I got involved in reverse engineering NVIDIA GPUs’ power management in order to write an open source driver, writing in a reverse-engineed assembly to implement automatic power management for this driver, creating my own smart wireless modems which detects the PHY parameters of incoming transmissions on the fly (modulation, center frequency) by using software-defined radio, and having fun with arduinos, single-board computers, and designing my custom PCBs.

The second thing that changed is that Moore’s law has grinded to a halt, leading to a more architecture-centric instead of a fab-oriented world. This reduced the advantage ASICs had on FPGAs, by creating a software eco-system that is more geared towards parallelism rather than high-frequency single-thread performance.

Finally, FPGAs along with their community have gotten a whole lot more attractive! From the FPGAs themselves to their toolchains, let’s review what changed, and then ask ourselves why this has not translated to upstream Linux drivers for FPGA-based open source designs.

Even hobbyists can make useful HW designs

Programmable logic elements have gone through multiple ages throughout their life. Since their humble beginning, they have always excelled at low-volune designs by spreading the cost of creating a new ASIC onto as many customers as possible. This has enabled start-ups and hobbyists to create their own niche and get into the market without breaking the bank.

Nowadays, FPGAs are all based around Lookup Tables (LUT) rather than a set of logic gates as they can re-create any logic function and can also serve as flip-flops (memory unit). Let’s have a quick look at what changed throughout the “stack” that makes designing FPGA-based HW designs so approachable even to hobbyists.

Price per LUT

Historically, FPGAs have compared negatively to ASICs due to their increased latency (limiting the maximum frequency of the design), and power efficiency. However, just like CPUs and GPUs, one can compensate for these limitations by making a wider/parallel design operating at a lower frequency. Wider designs however require more logic elements / LUTs.

Fortunately, the price per LUT has fallen dramatically since the introduction of FPGAs , to the point that pretty much all but the biggest designs would fit in them. Since then, the focus has shifted on providing hard IPs (fixed functions) instead. This enables a $37 part ( XCA712T ) to be able to fit over 3 Linux-worthy RISC-V processors running at 180 MHz, with 720 kB of block RAM available for caches, FIFOs, or anything else. By raising the budget to the $100 mark, the specs improve dramatically with an FPGA capable of running 40 Linux-worthy RISC-V CPUs and almost 5 MB of block RAM available for caches!

And just in case this would not be enough for you, you could consider the Alveo line up such as the Alveo U250 which has 1.3M LUTs and a peak throughput in INT8 operations of 33 TOPs and 64 GB of DDR4 memory (77 GB/s bandwidth). For memory-bandwidth-hungry designs, the Alveo U280 brings 8 GB of HBM2 memory to the table (460GB/s bandwidth) and 32 GB of DDR4 memory (38 GB/s of bandwidth), at the expense of having “only” 24.5 INT8 TOPs and 1M LUTs. Both models can be found for ~$3000 on ebay, used. What a bargain :D !

Toolchains

Proprietary toolchains

Linux is now really supported by the major players of the industry. Xilinx’s support came first (2005) , while Altera joined the club in 2009 . Both are however the definition of bloated, with toolchains weighing multiple GB ( ~6GB for Altera , while Xilinx is at a whooping 27 GB )!

Open source toolchains for a few FPGAS

Project icestorm created a fully-functional fully-opensource toolchain for Lattice’s ice40 FPGAs. Its regular structure made the reverse engineering and writing the toolchain easier. Since then, the more complex Lattice ECP5 FPGA got full support, and Xilinx’s 7-series is under way. All these projects are now working under the Symbiflow umbrella, which aims to become the GCC of FPGAs.

Languages:

Migen / LiteX

VHDL/Verilog are error-prone and do not land themselves to complex parametrization. This reduces the re-usability of modules. On the contrary, the Python language excels at meta-programming, and Migen provides a way to generate verilog from relatively-simple python constructs.

On top of Migen, LiteX provides easy-to-use and space-efficient modules to create your own System On Chip (SoC) in less than an hour! It already has support for 16+ popular boards , generates verilog, builds, and loads the bitstream for you. Documentation is however quite sparse, but I would suggest you read the LiteX for Hardware Engineers guide if you want to learn more.

High-level Synthesis (HLS)

For complex algorithms, Migen/VHDL/Verilog are not the most efficient languages as they are too low-level and are akin to writing image recognition applications in assembly.

Instead, high-level synthesis enables writing an untimed model of the design in C, and convert it in an efficient Verilog/VHDL module. This makes it easy to validate the model, and to target multiple FPGA vendors with the same code without an expensive rewrite of the module. Moreover, changes in the algorithm or latency requirements will not require an expensive rewrite and re-validation. Sounds amazing to me!

The bad part is that most of C/C++-compatible HLS tools are proprietary, and open source ones are either Scala-based (Chisel, SpinalHDL) or seem to be academic toy projects. I hope I am wrong though, so I’ll need to look more into them as the prospects are just too good to pass! Let me know in the comments which projects are your favourite!

Hard IPs (Fixed functions)

Initially, FPGAs were only made of a ton of gates / LUTs, and designs would be fully implemented using them. However, some functions could be better implemented as a fast and efficient fixed function: block memory, Serializer/Deserializer (parallel to serial and vice versa, often call SERDES), PLLs (clock generators), memory controlers, PCIe, …

These fixed-function blocks are called Hard IPs, while the part implemented using the programmable part of the FPGA is by extension called a soft IP. Hard IPs used to be reserved to higher-end parts, but they are nowadays found on most FPGAs, save the cheapest and smallest ones which are designed for low-power and self-reliance.

For example, the $100 part mentioned earlier includes multiple SERDES that are sufficient to achieve HDMI 1.4 compliance, a PCIe 2.0 with 4 lanes block, and a DDR3 memory controler. This makes it sufficient for implementing display controlers with multiple outputs and inputs, as seen on the NeTV2 open hardware board.

Hard IPs can also be the basis of proprietary soft IPs. For instance, Xilinx sells HDMI 1.4/2.0 receivers IPs that use the SERDES hard IPs to achieve the necessary 18Gb/s bandwidth needed to achieve HDMI compliance.

Soft-CPUs

One might wonder why use an FPGA to implement a CPU. Indeed, physical CPUs which are dirt-cheap and better-performing could simply be installed alongside the FPGA! So, why waste LUTs on a CPU? This article addresses it better than I could, but the gist of it is that they really completement fixed-logic well for less latency-oriented parts and provide a lot of value. The inconvenients are that an additional firmware is needed for the SoC, but that is no different from having external CPUs.

There has been quite a few open source toy soft-CPUs for FPGAs, and some proprietary vendor-provided ones. The problem has been that their toolchain was often out of tree, and/or Linux couldn’t run on them. This really changed with the introduction of RISC V, which is pretty efficient, is supported in mainline Linux and GCC , and can fit comfortably in even the smallest FPGAs from Altera and Xilinx. What’s there not to love?

Open hardware boards

So, all of these nice improvements in FPGAs and their community is great, but it wouldn’t be as attractive if not for all the cheap open hardware boards with their inovative designs using them:

  • Fomu ($50): an ice40-based FPGA that fits in your USB port and is sufficient to play with RISC V and a couple of IOs using a full-opensource toolchain!
  • IceBreaker ($69): a more traditional ice40-based board that is oriented towards IOs, low-cost, and a full-opensource toolchain.
  • ULX3S ($115-200): the ultimate ECP5-based board? It can be used as a complete handheld or static game console (including wireless controlers) with over-the-air updates, a USB/Wireless display controler, an arduino-compatible home-automation gateway including surveillance cameras. All of that with a full-opensource toolchain.
  • NeTV2 : Video-oriented platform with 2 HDMI inputs and 2 HDMI outputs which can run as a standalone device with USB and Ethernet connectivity, or as an accelerator using the PCIe 2.0 4x connector. The most expensive board has enough gates to get into serious computing power which could be used to create a slow GPU, with a pretty-decent display controler! Being Xilinx’s Artix7-based, the opensource toolchains is not yet complete, but by the time you will be done implementing your design, I am sure the toolchain will be ready!

Ultimately, these boards provide a good platform for any sort of project, further reducing the cost of entry in the hobby / market, and providing ready-made designs to be incorporated in your projects. All seem pretty good on the hardware side, so why don’t we have a huge community around a board that would provide the flexibility of arduinos but with Raspberry-Pi-like feature set?

Open source hardware blocks exist

We have seen that board availability, toolchains, languages, speed, nor price are limiting even hobbyists from getting into hardware design. So, there must be open blocks that could be incorporated in designs, right?

The answer is a resounding YES! The first project I would like to talk about is LiteX , which is a HDL language with batteries included (like Python). Here is a trimmed-down version of the different blocks it provides:

  • LiteX
    • Soft CPUs: blackparrot, cv32e40p, lm32, microwatt, minerva, mor1kx, picorv32, rocket, serv, and vexriscv
    • Input/Outputs: GPIO, I2C, SPI, I2S, UART, JTAG, PWM, XADC, …
    • Wishbone bus : Enable MMIO access to the different IPs for the soft-CPUs, or through different buses (PCIe, USB, ethernet, …)
    • Clock domains, ECC, random number generation, …
  • LiteDRAM : A SDRAM controller soft IP, or wrapper for DDR/LPDDR/DDR2/DDR3/DDR4 hard IPs of Xilinx or DDR3 for the ECP5.
  • LiteEth : A 10/100/1000 ethernet soft IP which also allows you to access the wishbone bus through it!
  • LitePCIe : Wrapper for the PCIe Gen2 x4 hard IPs of Xilinx and Intel
  • LiteSATA / LiteSDCard : Soft IP to access SATA drives / SD Cards, providing extensive storage capabilities to your soft CPU.
  • LiteVideo : HDMI input/output soft IPs, with DMA, triple buffering, and color space conversion.

Using LiteX, one may create a complete System of Chip in a matter of hours. Adding a block is as simple as adding two lines of code to the SoC: One line to instanciate the block (like one would instanciate an object), and one to expose it through the wishbone bus. And if this isn’t enough, check out the new Open WiFi project , or the OpenCores project which seems to have pretty much everything one could hope for.

So… where are the drivers for open source blocks?

We have seen that open hardware boards with capable FPGAs and useful IOs are affordable even to hobbyists. We have also seen that creating SoCs can be done in a matter of hours, so why don’t we have drivers for all of them?

I mean, we have a FPGA subsystem that is focused on loading bitstreams at boot, or even supporting on-the-fly FPGA reconfiguration. We have support for most hard IPs , but only when accessed through the integrated ARM processor of some FPGAs. So, why don’t we have drivers for soft IPs? Could it be their developers would not want to upstream drivers for them because the interface and the base address of the block is subject to change? It certainly looks like it!

But what if we could create an interface that would allow listing these blocks, the current version of their interface, and their base address? This would basically be akin to the Device Tree , but without the need to ship to every single user the netlist for the SoC you created. This would enable the creation of a generic upstream driver for all the versions of a soft IPs and all the boards using them, and thus make open source soft IPs more usable.

Removing the fear of ABI instability in open cores is at the core of my new project, LiteDIP . To demonstrate its effectiveness, I would like to expose all the hardware available on the NeTV2 (HDMI IN/OUT, 10/100 ethernet, SD Card reader, Fan, temperature, voltages), and the ULX3S (HDMI IN/OUT, WiFi, Bluetooth, SD Card reader, LEDs, GPIOs, ADC, buttons, Audio, FM/AM radio, …) using the same driver. Users could pick and chose modules, configure them to their liking, and no driver changes would be necessary. It sounds ambitious, but also seems like a worthy challenge! Not only do I get to enjoy a new hobby, but it would bring together software and hardware developers, enabling the creation of modern-ish computers or accelerators using one size fits all open hardware boards.

Am I the only one excited by the prospect? Stay tuned for updates on the project!


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

游戏编程算法与技巧

游戏编程算法与技巧

【美】Sanjay Madhav / 刘瀚阳 / 电子工业出版社 / 2016-10 / 89

《游戏编程算法与技巧》介绍了大量今天在游戏行业中用到的算法与技术。《游戏编程算法与技巧》是为广大熟悉面向对象编程以及基础数据结构的游戏开发者所设计的。作者采用了一种独立于平台框架的方法来展示开发,包括2D 和3D 图形学、物理、人工智能、摄像机等多个方面的技术。《游戏编程算法与技巧》中内容几乎兼容所有游戏,无论这些游戏采用何种风格、开发语言和框架。 《游戏编程算法与技巧》的每个概念都是用C#......一起来看看 《游戏编程算法与技巧》 这本书的介绍吧!

URL 编码/解码
URL 编码/解码

URL 编码/解码

RGB HSV 转换
RGB HSV 转换

RGB HSV 互转工具

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具