How would you like a 1000x speed increase

栏目: IT技术 · 发布时间: 5年前

内容简介：Good, that's the click-baity title out of the way. Sorry for taking such a long time to write again! There really has been everything going on.To get back into blogging, I've decided to quickly write about a change I made some time ago already.This change

Good, that's the click-baity title out of the way. Sorry for taking such a long time to write again! There really has been everything going on.

To get back into blogging, I've decided to quickly write about a change I made some time ago already.

This change was for the "instrumented profiler", i.e. the one that will at run-time change all the code of the user's program, in order to measure execution times and count up calls and allocations.

In order to get everything right, the instrumented profiler keeps an entire call graph in memory. If you haven't seen something like it yet, imagine taking stack traces at every point in your program's life, and all these stack traces put together make all the paths in the tree that point at the root.

This means, among other things, that the same function can come up multiple times. With recursion, the same function can in fact come up a few hundred times "in a row". In general, if your call tree can become both deep and wide, you can end up with a whole truckload of nodes in your tree.

How would you like a 1000x speed increase — Photo by Gabriel Garcia Marengo / Unsplash

Is it a bad thing to have many nodes? Of course, it uses up memory. Only a single path on the tree is ever interesting at any one moment, though. Memory that's not read from or written to is not quite as "expensive". It never has to go into the CPU cache, and is even free to be swapped out to disk and/or compressed. But hold on, is this really actually the case?

It turns out that when you're compiling the Core Setting, which is a code file almost 2½ megabytes big with about 71½ thousand lines, and you're profiling during the parsing process, the tree gets enormous. At the same time, the parsing process slows to a crawl. What on earth is wrong here?

Well, looking at what MoarVM spends most of its time doing while the profiler runs gives you a good hint: It's spending almost all of its time going through the entirety of the tree for garbage collection purposes. Why would it do that, you ask? Well, in order to count allocated objects at every node, you have to match the count with the type you're allocating, and that means you need to hold on to a pointer to the type, and that in turn has to be kept up to date if anything moves (which the GC does to recently-born things) and to make sure types aren't considered unused and thrown out.

That's bad, right? Isn't there anything we can do? Well, we have to know at every node which counter belongs to which type, and we need to give all the types we have to the garbage collector to manage. But nothing forces us to have the types right next to the counter. And that's already the solution to the problem:

Holding on to all types is now the job of a little array kept just once per tree, and next to every counter there's just a little number that tells you where in the array to look.

This increases the cost of recording an allocation, as you'll now have to go to a separate memory location to match types to counters. On the other hand, the "little number" can be much smaller than before, and that saves memory in every node of the tree.

More importantly, the time cost of going through the profiler data is now independent of how big the tree is, since the individual nodes don't have to be looked at at all.

With a task as big as parsing the core setting, which is where almost every type, exception, operator, or sub lives, the difference is a factor of at least a thousand. Well, to be honest I didn't actually calculate the difference, but I'm sure it's somewhere between 100x faster and 10000x faster, and going from "ten microseconds per tree node" to "ten microseconds per tree" isn't a matter of a single factor increase, it's a complexity improvement from O(n) to O(1). As long as you can find a bigger tree, you can come up with a higher improvement factor. Very useful for writing that blog post you've always wanted to put at the center of a heated discussion about dishonest article titles!

Anyway, on testing my patch, esteemed colleague MasterDuke had this to say on IRC:

timotimo: hot damn, what did you do?!?! stage parse only took almost twice as long (i.e., 60s instead of the normal 37s) instead of the 930s last time i did the profile

(psst, don't check what 930 divided by 60 is, or else you'll expose my blog post title for the fraud that it is!)

Well, that's already all I had for this post. Thanks for your attention, stay safe, wear a mask (if by the time you're reading this the covid19 pandemic is still A Thing, or maybe something new has come up), and stay safe!

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

How would you like a 1000x speed increase

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Text Processing in Python

David Mertz / Addison-Wesley Professional / 2003-6-12 / USD 54.99

Text Processing in Python describes techniques for manipulation of text using the Python programming language. At the broadest level, text processing is simply taking textual information and doing som......一起来看看《Text Processing in Python》这本书的介绍吧!

码农工具

How would you like a 1000x speed increase

Text Processing in Python

RGB转16进制工具

html转js在线工具

UNIX 时间戳转换