Making Rust as Fast as Go

栏目: IT技术 · 发布时间: 4年前

内容简介:Go is garbage collected, rust is not. That means rust is faster than go, right? No! Not always.Let’s take an example that I stumbled across while playing around with an algorithm that calculates Levenshtein edit distances. I wanted to compare the performan

Go is garbage collected, rust is not. That means rust is faster than go, right? No! Not always.

Let’s take an example that I stumbled across while playing around with an algorithm that calculates Levenshtein edit distances. I wanted to compare the performance of the same algorithm in a bunch of different languages. Two of these languages were rust and go.

To my surprise, the go version was faster than the rust version. A lot faster. My initial reaction was that I must have implemented the rust version incorrectly. Maybe I was doing some unsafe (but fast) things in go that rust wouldn’t let me do. To account for this, I laid out some ground rules:

  1. The more idiomatic the better. Rust, for example, promises zero cost abstractions so we should lean on this & write safe code
  2. No static global variables. This means that containers need to be heap allocated & dynamically sized. We don’t know how big the inputs will be!
  3. Memory access should be safe. Don’t eliminate bounds checks
  4. Assume that text is utf8 encoded

In short, this should be code that you’d happily ship to prod. Here’s what I ended up with:

edit_distance.go
func EditDistance(source, target string) int {
	if len(source) == 0 {
		return len(target)
	}

	if len(target) == 0 {
		return len(source)
	}

	sourceChars := []rune(source)
	targetChars := []rune(target)

	cache := make([]int, len(target)+1)
	for i := 0; i < len(target)+1; i++ {
		cache[i] = i
	}

	for i, sourceChar := range sourceChars {
		nextDist := i + 1
		for j, targetChar := range targetChars {
			currentDist := nextDist

			distIfSubstitute := cache[j]
			if sourceChar != targetChar {
				distIfSubstitute++
			}

			distIfInsert := currentDist + 1
			distIfDelete := cache[j+1] + 1

			nextDist = min(distIfDelete, min(distIfInsert, distIfSubstitute))

			cache[j] = currentDist
		}

		cache[len(target)] = nextDist
	}

	return cache[len(target)]
}
edit_distance.rs
pub fn levenshtein_distance(source: &str, target: &str) -> usize {
    if source.is_empty() {
        return target.len();
    }

    if target.is_empty() {
        return source.len();
    }

    let mut cache: Vec<usize> = (0..=target.chars().count()).collect();

    for (i, source_char) in source.chars().enumerate() {
        let mut next_dist = i + 1;

        for (j, target_char) in target.chars().enumerate() {
            let current_dist = next_dist;

            let mut dist_if_substitute = cache[j];
            if source_char != target_char {
                dist_if_substitute += 1;
            }

            let dist_if_insert = current_dist + 1;
            let dist_if_delete = cache[j + 1] + 1;

            next_dist = std::cmp::min(
                dist_if_substitute,
                std::cmp::min(dist_if_insert, dist_if_delete),
            );

            cache[j] = current_dist;
        }

        cache[target.len()] = next_dist;
    }

    cache[target.len()]
}

Even with the playing field levelled, go still outperformed rust by 50%. For the dataset I was using to benchmark the programs, the go version to 1.5 seconds and rust 3 seconds.

This was bizarre. As far as I could tell, these programs were identical besides the fact that the go runtime needs to spend precious cycles collecting garbage. That means it should be slower, right?

I took the question to my coworkers, who had some good suggestions. Theories included escape analysis , string allocation, and the rust implementation being wrong. The last one was true but the performance gap remained once I fixed it (I have tests now!).

The winning suggestion ended up being to switch the allocator in the rust program to jemalloc . This was the default allocator used by rust binaries in the past, but it was removed in favour of using the system allocator instead in late 2018 . Read #36963 to get the full rationale for this change.

To change the allocator, you simply add the following to the start of your program:

extern crate jemallocator;

#[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;

This made a huge difference. On my machine, this dropped the execution time from 3 seconds to about 1.8 seconds. Let’s take a look at the flamegraphs (generated with flamegraph-rs/flamegraph ) to see the change:

Making Rust as Fast as Go

Making Rust as Fast as Go

This means that the time spend allocating has dropped from about 40% to 20%. Keep in mind this is for the full benchmark, including all the setup code, but it gives us a good sense of what changed.

I’m not sure why the change was so severe. I tried searching for things like “macos allocator slow” but didn’t find anything. If you have some information here, please let me know!

Why doesn’t go suffer from the slow system allocator on macos? Two things come to mind. The first is that go uses a custom allocator . I assume that this is also faster than the macos system allocator. The second is that while this program does spend a lot of time allocating and freeing memory, there are barely any objects in the heap at any given moment. EditDistance only allocates one object on the heap ( cache ), meaning that the time spent garbage collecting is probably negligible.

So the answer is:

  1. The macos allocator is slow
  2. Go uses a custom allocator, which is faster than the one that ships with macos

What’s the lesson here? If you’re writing a rust program that does a lot of allocation, consider using a non-system allocator if you need some more performance. Don’t make the mistake of extrapolating beyond that simple point, though. This is a “microbenchmark”, and the results are tightly coupled to the very contrived scenario I’ve concocted.

Check out the whole github repo. It has implementations in several languages, as well as scripts to benchmark + test them.


以上所述就是小编给大家介绍的《Making Rust as Fast as Go》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

精益数据分析

精益数据分析

[加] 阿利斯泰尔·克罗尔、[加] 本杰明·尤科维奇 / 韩知白、王鹤达 / 人民邮电出版社 / 2014-12 / 79.00元

本书展示了如何验证自己的设想、找到真正的客户、打造能赚钱的产品,以及提升企业知名度。30多个案例分析,全球100多位知名企业家的真知灼见,为你呈现来之不易、经过实践检验的创业心得和宝贵经验,值得每位创业家和企业家一读。 深入理解精益创业、数据分析基础,和数据驱动的思维模式 如何将六个典型的商业模式应用到各种规模的新企业 找到你的第一关键指标 确定底线,找到出发点 在大......一起来看看 《精益数据分析》 这本书的介绍吧!

MD5 加密
MD5 加密

MD5 加密工具

html转js在线工具
html转js在线工具

html转js在线工具

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具