PHP 8: JIT performance in real-life web applications

栏目: IT技术 · 发布时间: 3年前

内容简介:For those interested inthe JIT in PHP 8, I did some benchmarks for you in real-world web application scenario. Be aware that these benchmarks don't say anything about whether the JIT is useful or not, they only show whether it can improve the performance o

For those interested inthe JIT in PHP 8, I did some benchmarks for you in real-world web application scenario. Be aware that these benchmarks don't say anything about whether the JIT is useful or not, they only show whether it can improve the performance of your average web application, or not.

These benchmarks are run on my local machine. As so, they don't say anything about absolute performance gains, we're only able to make conclusions what kind of relative impact the JIT has on our code.

I'll be using one of my hobby projects , written in Laravel. Since these benchmarks were run on the first alpha version of PHP 8, I had to manually fix some deprecation warnings in Laravel's source code, all locally.

Finally: I'll be running PHP FPM, configured to spawn 20 child processes, and I'll always make sure to only run 20 concurrent requests at once, just to eliminate any extra performance hits on the FPM level. Sending these requests is done using the following command, with ApacheBench:

ab -n 100 -c 20 -l http://aggregate.stitcher.io.test:8081/discover

The JIT setup requires a section on its own. Honestly, this is one of the most confusing ways of configuring a PHP extension I've ever seen, and I'm afraid the syntax is here to stay, since we're too close to PHP 8's feature freeze for another RFC to make changes to it.

So here goes:

The JIT is enabled by specifying opcache.jit_buffer_size in php.ini . If this directive is excluded, the default value is set to 0, and the JIT won't run.

Next, there are several JIT control options, they are all stored in a single directive called opcache.jit and could, for example, look like this:

opcache.jit_buffer_size=100M
opcache.jit=1235

The RFC lists the meaning of each number. Mind you: this is not a bit mask, each number simply represents another configuration option. The RFC lists the following options:

O — Optimization level

0 don't JIT
1 minimal JIT (call standard VM handlers)
2 selective VM handler inlining
3 optimized JIT based on static type inference of individual function
4 optimized JIT based on static type inference and call tree
5 optimized JIT based on static type inference and inner procedure analyses
0 JIT all functions on first script load
1 JIT function on first execution
2 Profile on first request and compile hot functions on second request
3 Profile on the fly and compile hot functions
4 Compile functions with @jit tag in doc-comments

R — register allocation

0 don't perform register allocation
1 use local liner-scan register allocator
2 use global liner-scan register allocator

C — CPU specific optimization flags

0 none
1 enable AVX instruction generation

One small gotcha: the RFC lists these options in reverse order, so the first digit represents the C value, the second the R , and so on.

Anyways, the RFC proposes 1235 as the best default, it will do maximum jitting, profile on the fly, use a global liner-scan register allocator — whatever that might be — and enables AVX instruction generation.

In my benchmarks, I'll use several variations of JIT configuration, in order to compare the differences.

So let's start benchmarking!

Establishing a baseline

First it's best to establish whether the JIT is working properly or not. We know from the RFC that it does have a significant impact on calculating a Mandelbrot — something most of us probably don't do in our web apps.

So let's start with that example. I copied some mandelbrot code and accessed it via the same HTTP application I'll run the next benchmarks on. These are the results:

requests/second (more is better)
Mandelbrot without JIT 15.24
Mandelbrot with JIT 38.99

Great, it looks like the JIT is working! That's more than a two time performance increase. Let's more on to our first real-life comparison. We're going to start slow: the JIT configured with 1231 , and 100 MB of buffer size.

The page we're benchmarking shows an overview of posts, so there's some recursion happening, and were touching several core parts of the Laravel as well: routing, DI, ORM, authentication.

requests/second (more is better)
No JIT 6.48
JIT enabled ( 1231 , 100M buffer) 6.33

Hm. A decrease in performance enabling the JIT? Sure that's possible! What theJIT does is look at the code when its executing, discover "hot" parts of the code, and optimise those for the next run as machine code.

With the current configuration, analysing the code will happen on the fly, on every request. If there's little or no code to optimise, it's natural that there will be a performance price to pay.

So let's test a different setup, the one the RFC proposes as the most optimal one: 1235

requests/second (more is better)
No JIT 6.48
JIT enabled ( 1235 , 100M buffer) 6.75

Here we see an increase, albeit a teeny-tiny one. Turns out there were some parts that could be optimised, and their performance gain outweighed the performance cost.

There's two more things to test: what if we don't profile on every request, but instead only at the start, that's what the T option is for: 2 — Profile on first request and compile hot functions on second request .

In other words, let's use 1225 as the JIT option.

requests/second (more is better)
No JIT 6.48
JIT enabled ( 1235 , 100M buffer) 6.75
JIT enabled ( 1225 , 100M buffer) 6.78

Once again a — small is an understatement — increase of performance!

One thing I'm wondering though: if we're only profiling on the first request, there probably are some parts of the code that will be missed out on optimisations; that's something someone will probably need to do some more research on.

So I suspect using 1225 in benchmarks has a positive impact because we're always requesting the same page, but in practice this probably will be a less optimal approach.

Finally, let's bump the buffer limit. Let's give the JIT a little more room to breath with 500MB of memory:

requests/second (more is better)
No JIT 6.48
JIT enabled ( 1235 , 100M buffer) 6.75
JIT enabled ( 1235 , 500M buffer) 6.52

A slight decrease in performance. One I cannot explain to be honest. I'm sure someone smarter than me can provide us the answer though!

So, that concludes my JIT testing. As expected: the JIT probably won't have a significant impact on web applications, at least not right now.

I'm won't discuss my thoughts on whether the JIT itself is a good addition or not in this post, let's have those discussions together on social media !


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

C语言程序设计现代方法

C语言程序设计现代方法

K. N. King / 人民邮电出版社 / 2007-11 / 55.00元

《C语言程序设计现代方法》最主要的一个目的就是通过一种“现代方法”来介绍C语言,实现客观评价C语言、强调标准化C语言、强调软件工程、不再强调“手工优化”、强调与c++语言的兼容性的目标。《C语言程序设计现代方法》分为C语言的基础特性。C语言的高级特性、C语言标准库和参考资料4个部分。每章都有“问与答”小节,给出一系列与本章内容相关的问题及其答案,此外还包含适量的习题。一起来看看 《C语言程序设计现代方法》 这本书的介绍吧!

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

html转js在线工具
html转js在线工具

html转js在线工具

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具