Wolfsort is a stable adaptive hybrid radix sort

栏目: IT技术 · 发布时间: 4年前

内容简介:This document describes a stable adaptive hybrid radix / merge sort named wolfsort. It is likely the fastest sort written so far for sorting a mixture of random and ordered data.While an adaptive merge sort is very fast at sorting ordered data, its inabili

Intro

This document describes a stable adaptive hybrid radix / merge sort named wolfsort. It is likely the fastest sort written so far for sorting a mixture of random and ordered data.

Why a hybrid?

While an adaptive merge sort is very fast at sorting ordered data, its inability to effectively partition is its greatest weakness. Radix sort on the other hand is unable to take advantage of sorted data. Wolfsort tries to avoid the worst case of each algorithm.

Adaptive partitioning

Wolfsort uses quadsort as its adaptive merge sort since it's faster than Timsort. The radix sort used by wolfsort has two primary functions.

  1. Detecting whether the array is worth partitioning with radix sort.
  2. Partition in a way that is beneficial to quadsort.

Detecting whether the array is worth partitioning

Wolfsort operates like a typical radix sort by creating 256 buckets and dividing each unsigned 32 bit integer by 16777216. Without any optimizations this would multiply the memory overhead by 256 times. Being a hybrid sort wolfsort solves this problem by giving each bucket 1/8th of the maximum size.

If the maximum bucket size is reached the radix sort is aborted and quadsort is ran instead. While this may seem wasteful it means in practice that the worst case of each algorithm is avoided at a relatively small cost.

Partition in a way that is beneficial to merge sort

Because this approach is the equivalent of an in-place MSD Radix sort the 256 buckets are in order once partitioning completes. The next step is to sort the content of each bucket usinge quadsort and wolfsort will be finished.

Memory overhead

Wolfsort requires 8 times the array size in swap memory for the O(n) partitioning process. The sorting process that follows requires 1/32nd the array size in swap memory.

If not enough memory is available wolfsort falls back on quadsort which requires 1/2 the array size in swap memory.

In ideal conditions there is no computational difference between allocating and freeing 10 MB of memory and allocating and freeing 80 MB of memory. Historically software was always low on memory, and subsequently sorting algorithms were developed that bent over backwards in order to use as little additional memory as possible. Nowadays most modern systems have several gigabytes of memory that are not used and are just sitting idle.

During the partitioning process each bucket becomes akin to Schrödinger's cat, it may be used almost fully, not at all, or somewhere in between. Based on probability we know exactly 1/8th will be actually used. The only overhead is to keep track of the size of each bucket, and to check that the bucket is not overflowing upon each addition.

Proof of concept

Wolfsort is primarily a proof of concept, currently it only properly supports unsigned 32 bit integers.

Memory allocation with malloc() is a bottleneck on many platforms. Using a better memory allocator or permanent swap memory is a possible solution.

Benchmark

The following benchmark was on WSL gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1). The source code was compiled using g++ -O3 -fpermissive bench.c. Each test was ran 100 times and only the best run is reported.

Benchmark: quadsort vs std::stable_sort vs timsort vs pdqsort vs wolfsort

The following benchmark was on WSL gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1). The source code was compiled using g++ -O3 quadsort.cpp. Each test was ran 100 times and only the best run is reported. It should be noted that pdqsort is not a stable sort which is the reason it's faster on generic order data.

The X axis of the bar graph below is the execution time.

Wolfsort is a stable adaptive hybrid radix sort

quadsort vs std::stable_sort vs timsort vs pdqsort vs wolfsort data table
Name Items Type Best Average Comparisons Distribution
quadsort 1000000 i32 0.070469 0.070635 random order
stablesort 1000000 i32 0.073865 0.074078 random order
timsort 1000000 i32 0.089192 0.089301 random order
pdqsort 1000000 i32 0.029911 0.029948 random order
wolfsort 1000000 i32 0.017359 0.017744 random order
quadsort 1000000 i32 0.000485 0.000511 ascending
stablesort 1000000 i32 0.010188 0.010261 ascending
timsort 1000000 i32 0.000331 0.000342 ascending
pdqsort 1000000 i32 0.000863 0.000875 ascending
wolfsort 1000000 i32 0.000539 0.000579 ascending
quadsort 1000000 i32 0.008901 0.008934 ascending saw
stablesort 1000000 i32 0.017093 0.017275 ascending saw
timsort 1000000 i32 0.008615 0.008674 ascending saw
pdqsort 1000000 i32 0.047785 0.047921 ascending saw
wolfsort 1000000 i32 0.012490 0.012554 ascending saw
quadsort 1000000 i32 0.038260 0.038343 generic order
stablesort 1000000 i32 0.042292 0.042388 generic order
timsort 1000000 i32 0.055855 0.055967 generic order
pdqsort 1000000 i32 0.008093 0.008168 generic order
wolfsort 1000000 i32 0.038320 0.038417 generic order
quadsort 1000000 i32 0.000559 0.000576 descending order
stablesort 1000000 i32 0.010343 0.010438 descending order
timsort 1000000 i32 0.000891 0.000900 descending order
pdqsort 1000000 i32 0.001882 0.001897 descending order
wolfsort 1000000 i32 0.000604 0.000625 descending order
quadsort 1000000 i32 0.006174 0.006245 descending saw
stablesort 1000000 i32 0.014679 0.014767 descending saw
timsort 1000000 i32 0.006419 0.006468 descending saw
pdqsort 1000000 i32 0.016603 0.016629 descending saw
wolfsort 1000000 i32 0.006264 0.006329 descending saw
quadsort 1000000 i32 0.018675 0.018729 random tail
stablesort 1000000 i32 0.026384 0.026508 random tail
timsort 1000000 i32 0.023226 0.023395 random tail
pdqsort 1000000 i32 0.028599 0.028674 random tail
wolfsort 1000000 i32 0.009517 0.009631 random tail
quadsort 1000000 i32 0.037593 0.037679 random half
stablesort 1000000 i32 0.043755 0.043899 random half
timsort 1000000 i32 0.047008 0.047112 random half
pdqsort 1000000 i32 0.029800 0.029847 random half
wolfsort 1000000 i32 0.013238 0.013355 random half
quadsort 1000000 i32 0.009605 0.009673 wave order
stablesort 1000000 i32 0.013667 0.013785 wave order
timsort 1000000 i32 0.007994 0.008138 wave order
pdqsort 1000000 i32 0.024683 0.024727 wave order
wolfsort 1000000 i32 0.009642 0.009709 wave order

The X axis of the graph below is the number of elements, the Y axis is the execution time. Wolfsort is a stable adaptive hybrid radix sort

quadsort vs std::stable_sort vs timsort vs pdqsort vs wolfsort data table
Name Items Type Best Average Comparisons Distribution
quadsort 100000 i32 0.005800 0.005835 random order
stablesort 100000 i32 0.006092 0.006122 random order
timsort 100000 i32 0.007605 0.007656 random order
pdqsort 100000 i32 0.002648 0.002670 random order
wolfsort 100000 i32 0.001148 0.001168 random order
quadsort 10000 i32 0.004568 0.004593 random order
stablesort 10000 i32 0.004813 0.004923 random order
timsort 10000 i32 0.006326 0.006376 random order
pdqsort 10000 i32 0.002345 0.002373 random order
wolfsort 10000 i32 0.001256 0.001274 random order
quadsort 5000 i32 0.004076 0.004086 random order
stablesort 5000 i32 0.004394 0.004420 random order
timsort 5000 i32 0.005901 0.005938 random order
pdqsort 5000 i32 0.002093 0.002107 random order
wolfsort 5000 i32 0.000968 0.001086 random order
quadsort 2500 i32 0.003196 0.003303 random order
stablesort 2500 i32 0.003801 0.003942 random order
timsort 2500 i32 0.005296 0.005322 random order
pdqsort 2500 i32 0.001606 0.001661 random order
wolfsort 2500 i32 0.000509 0.000520 random order
quadsort 1250 i32 0.001767 0.001823 random order
stablesort 1250 i32 0.002812 0.002887 random order
timsort 1250 i32 0.003789 0.003865 random order
pdqsort 1250 i32 0.001036 0.001325 random order
wolfsort 1250 i32 0.000534 0.000655 random order
quadsort 675 i32 0.000368 0.000592 random order
stablesort 675 i32 0.000974 0.001160 random order
timsort 675 i32 0.000896 0.000948 random order
pdqsort 675 i32 0.000489 0.000531 random order
wolfsort 675 i32 0.000283 0.000308 random order

以上所述就是小编给大家介绍的《Wolfsort is a stable adaptive hybrid radix sort》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

松本行弘的程序世界

松本行弘的程序世界

松本行弘 / 柳德燕、李黎明、夏倩、张文旭 / 人民邮电出版社 / 2011-8 / 75.00元

《松本行弘的程序世界》是探索程序设计思想和方法的经典之作。作者从全局的角度,利用大量的程序示例及图表,深刻阐述了Ruby编程语言的设计理念,并以独特的视角考察了与编程相关的各种技术。阅读《松本行弘的程序世界》不仅可以深入了解编程领域各个要素之间的关系,而且能够学到大师的思考方法。 《松本行弘的程序世界》面向各层次程序设计人员和编程爱好者,也可以供相关技术人员参考。一起来看看 《松本行弘的程序世界》 这本书的介绍吧!

在线进制转换器
在线进制转换器

各进制数互转换器

XML、JSON 在线转换
XML、JSON 在线转换

在线XML、JSON转换工具

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器