How would you like a 1000x speed increase

栏目: IT技术 · 发布时间: 3年前

内容简介:Good, that's the click-baity title out of the way. Sorry for taking such a long time to write again! There really has been everything going on.To get back into blogging, I've decided to quickly write about a change I made some time ago already.This change

Good, that's the click-baity title out of the way. Sorry for taking such a long time to write again! There really has been everything going on.

To get back into blogging, I've decided to quickly write about a change I made some time ago already.

This change was for the "instrumented profiler", i.e. the one that will at run-time change all the code of the user's program, in order to measure execution times and count up calls and allocations.

In order to get everything right, the instrumented profiler keeps an entire call graph in memory. If you haven't seen something like it yet, imagine taking stack traces at every point in your program's life, and all these stack traces put together make all the paths in the tree that point at the root.

This means, among other things, that the same function can come up multiple times. With recursion, the same function can in fact come up a few hundred times "in a row". In general, if your call tree can become both deep and wide, you can end up with a whole truckload of nodes in your tree.

How would you like a 1000x speed increase
Photo by Gabriel Garcia Marengo / Unsplash

Is it a bad thing to have many nodes? Of course, it uses up memory. Only a single path on the tree is ever interesting at any one moment, though. Memory that's not read from or written to is not quite as "expensive". It never has to go into the CPU cache, and is even free to be swapped out to disk and/or compressed. But hold on, is this really actually the case?

It turns out that when you're compiling the Core Setting, which is a code file almost 2½ megabytes big with about 71½ thousand lines, and you're profiling during the parsing process, the tree gets enormous. At the same time, the parsing process slows to a crawl. What on earth is wrong here?

Well, looking at what MoarVM spends most of its time doing while the profiler runs gives you a good hint: It's spending almost all of its time going through the entirety of the tree for garbage collection purposes. Why would it do that, you ask? Well, in order to count allocated objects at every node, you have to match the count with the type you're allocating, and that means you need to hold on to a pointer to the type, and that in turn has to be kept up to date if anything moves (which the GC does to recently-born things) and to make sure types aren't considered unused and thrown out.

That's bad, right? Isn't there anything we can do? Well, we have to know at every node which counter belongs to which type, and we need to give all the types we have to the garbage collector to manage. But nothing forces us to have the types right next to the counter. And that's already the solution to the problem:

Holding on to all types is now the job of a little array kept just once per tree, and next to every counter there's just a little number that tells you where in the array to look.

This increases the cost of recording an allocation, as you'll now have to go to a separate memory location to match types to counters. On the other hand, the "little number" can be much smaller than before, and that saves memory in every node of the tree.

More importantly, the time cost of going through the profiler data is now independent of how big the tree is, since the individual nodes don't have to be looked at at all.

With a task as big as parsing the core setting, which is where almost every type, exception, operator, or sub lives, the difference is a factor of at least a thousand. Well, to be honest I didn't actually calculate the difference, but I'm sure it's somewhere between 100x faster and 10000x faster, and going from "ten microseconds per tree node" to "ten microseconds per tree" isn't a matter of a single factor increase, it's a complexity improvement from O(n) to O(1). As long as you can find a bigger tree, you can come up with a higher improvement factor. Very useful for writing that blog post you've always wanted to put at the center of a heated discussion about dishonest article titles!

Anyway, on testing my patch, esteemed colleague MasterDuke had this to say on IRC:

timotimo: hot damn, what did you do?!?! stage parse only took almost twice as long (i.e., 60s instead of the normal 37s) instead of the 930s last time i did the profile

(psst, don't check what 930 divided by 60 is, or else you'll expose my blog post title for the fraud that it is!)

Well, that's already all I had for this post. Thanks for your attention, stay safe, wear a mask (if by the time you're reading this the covid19 pandemic is still A Thing, or maybe something new has come up), and stay safe!

How would you like a 1000x speed increase
Photo by Macau Photo Agency / Unsplash

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

增长黑客实战

增长黑客实战

范冰、张溪梦 / 电子工业出版社 / 2017-6 / 59.00

《增长黑客实战》围绕硅谷前沿的增长黑客职业,讲解增长理念的树立、增长团队的组建、流程制度的创立、技术营销的运用等团队运营成功实战经验。作者以自身创业经验为蓝本,结合真实案例,并融入一些伟大创业者的智慧,创建了一套思考、验证和追求卓越增长的理论体系。那些想要验证自己的创意、解决实际增长问题和拥有成功事业的人,可以将《增长黑客实战》当成一套清晰的实践指南、一幅组建增长团队的指导蓝图,或者一套值得反复玩......一起来看看 《增长黑客实战》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

在线进制转换器
在线进制转换器

各进制数互转换器

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试