Self balancing binary search trees comparison

In this post I will try to review some of the main versions of self balancing binary trees, provide Java implementations and micro benchmark their performance under various conditions.

Why do we need binary search trees?
Lets say we have a list as an input and we need to find some element in it. What is the best way to do it? Naive implementation could be simply iterating through all the elements and check each one. That maybe be fine for small lists and simple use cases. But can we do better? Probably not with a simple random array. However if we sort it beforehand it is possible to “divide and conqueror”. That is we can check element in the middle and if the one we are looking for is larger we do the same with sub list n/2 to n (where n is number of elements in the list). So we cut through search space considerably (O(log n) complexity instead of O(n)). That is fine but we need to sort the input before with some sorting algorithm. And if we are constantly modifying (inserting/deleting) the input and searching through it, it’s not practical to sort it each time before search. Here is where binary search trees steps in. We modify the way data is kept. Not in a simple list/array but in a tree like data structure. Looking at the picture you probably already see why it is easier to search in such a structure.

AVLtreef.svg

However binary search tree can get unbalanced and then loose its efficiency. To solve that problem self balancing binary search trees were invented. I am not going to explain how they work in detail here just provide general information and some ideas how they can be used. Also I will provide my implementations in java and some benchmarks so you can start playing with it if you want.

AVL tree

One of the oldest, most well known and most popular tree data structure. It saves additional height information in each node and re balances tree if height of one node is higher than its sibling by 2. That keeps tree rigidly balanced so search is extremely fast on AVL tree. However that comes with a cost of rebalancing and that can make insert/delete operations slightly more expensive than other trees.

Quick link to my implementation on GitHub

Red-Black tree

Most widely used self balancing binary search tree. It also saves additional information on nodes (black node or red node) and preserves tree balance based on certain rules about how nodes are colored. It can be less balanced than AVL tree (which means slower search) but with more efficient inserts and deletes. Also because only two colors required, Red-Black tree needs only one bit of additional information per node. All of this makes it most popular choice for general use cases. Java TreeSet is implemented as Red-Black tree.

Quick link to my implementation on GitHub

Scapegoat tree

This tree has advantage that it doesn’t need any additional memory per node. Once it finds out that the tree has to be re balanced it finds a scapegoat node and performs re balance operation on it. So it might be a good choice when memory is an issue. To find out if the tree is out of balance it uses parameter alpha. This allows to parametrise a scapegoat tree with different values of alpha which means different degrees of balance which means a choice between faster search or faster inserts. Imagine doing that dynamically – when there is heavy load on searches and not too much on inserts/deletes, you can lower alpha and increase it when amount of inserts/deletes increases.

Quick link to my implementation on GitHub

Treap

Basically randomized binary search tree and heap hybrid. It randomly generates and saves additional node parameter priority. Tree is then balanced such as priority values satisfies heap property. It is easy to implement and I think has a possibility of fast set operations (I did not implement it in my code). Other than that I don’t know if it has any advantage over other trees. Maybe instead of random priority flag we can use a non random (somehow calculated) importance flag which would keep more important nodes at the top?

Quick link to my implementation on GitHub

Splay tree

Splay tree is pretty interesting. It performs operation called splaying not only on inserts and deletes, but also when searching. This operation performs tree rotations and brings element up to the top. It means that recently accessed elements are always near the top. You can see how it can be very useful in certain use cases like some cache implementations. However additional rotations on search adds additional overhead and if each element is as likely to be searched then splaying doesn’t do that much good.

Quick link to my implementation on GitHub

Short summary

First choice for BST is a Red-Black tree as it is most universal data structure. However if your tree will be used mostly for search then take a look at AVL tree. If mostly the same elements are retrieved again and again then Splay tree can be even better. If sometimes there is a heavy read and sometimes a heavy write load then Scapegoat tree with different parameters alpha might be helpful. If for some reason you need to implement tree yourself and don’t yet feel very confident about your coding skills – Treap might be you choice.

Benchmarks

Here I have prepared some simple benchmarks for my own implementations of each tree. Please keep in mind that I might have made some mistakes or my implementation can be inefficient so let me know if you find that to be the case. Keep in mind that ms/op in units column means whole 100k elements test in ms. I’ve used jmh framework for benchmarking.

This first benchmark inserts elements from 1 to 100K. We see that all scapegoat trees performed very poorly. Not sure why. Splay tree performed phenomenally, however it is not that good once you realize that tree is not balanced at all and in fact is basically a linked list. Interesting that AVL tree was much better than Red-Black tree. Treap was quite good too (maybe even the best choice as it is much simpler than only slightly faster AVL tree). Another interesting observation is that less balanced Scapegoat tree (0.9 alpha) had slower inserts than 0.75 alpha balanced Scapegoat tree. In this case it is no surprise because we insert sorted elements. Having tree re balance too rarely in this situation increases costs for finding next element’s place once tree grows larger.

Insert sorted elements benchmark Score Score error Units
AVL tree 9.420 0.270 ms/op
JDK TreeSet 13.981 0.479 ms/op
Red-Black tree 17.430 0.917 ms/op
Scapegoat tree (alpha 0.6) 131.432 4.159 ms/op
Scapegoat tree (alpha 0.75) 76.769 2.538 ms/op
Scapegoat tree (alpha 0.9) 77.999 1.775 ms/op
Splay tree 1.229 0.039 ms/op
Treap 9.576 1.102 ms/op

Once again lets insert 100k elements but this time they are random and not sorted. My AVL tree implementation did quite well and Red-Black tree implementation was much slower than we could expect (remember, it suppose to be faster on inserts). Maybe my implementation is not very efficient or I made a mistake somewhere. Red-Black tree implementation from JDK was fastest here (but only barely faster than my non optimized AVL tree) and Splay tree was slowest so no more surprises.

Insert random elements benchmark Score Score error Units
AVL tree 39.051 1.998 ms/op
JDK TreeSet 38.811 4.356 ms/op
Red-Black tree 59.052 7.871 ms/op
Scapegoat tree (alpha 0.6) 50.738 4.400 ms/op
Scapegoat tree (alpha 0.75) 44.077 1.192 ms/op
Scapegoat tree (alpha 0.9) 44.862 2.168 ms/op
Splay tree 76.457 4.114 ms/op
Treap 47.978 1.656 ms/op

This next benchmark inserts 100K elements and then deletes them one by one. One more time my implementation of Red-Black tree under performs (any help why is that would be appreciated!). Interesting that more rigidly balanced Scapegoat trees couldn’t even complete this test (even when I rewrote tree flattening algorithm to iterative from recursion). My explanation is that it is not very good scenario for Scapegoat tree, as constant deletion triggers re balancing at the root too often (scapegoat tree re balances at  the root node on deletion) and it is an expensive operation as it has to flatten and then rebuild the whole tree (not just a scapegoat sub tree as in inserts). However it should do OK when there are not too many deletes in a row. Loosely balanced Scapegoat tree did quite well however. JDK Red-Black tree was fastest and Splay tree was slowest once again.

Delete elements benchmark Score Score error Units
AVL tree 88.266 4.713 ms/op
JDK TreeSet 67.010 5.138 ms/op
Red-Black tree 114.527 12.326 ms/op
Scapegoat tree (alpha 0.9) 75.266 1.758 ms/op
Splay tree 151.600 8.451 ms/op
Treap 97.886 7.516 ms/op

Now lets check searching 100K size trees. Surprisingly JDK Red-Black tree slightly outperformed also very fast AVL tree. Low alpha Scapegoat tree was just slightly slower than AVL tree. Because of splaying overhead on each search operation Splay tree is performing very poorly in such situation.

Search elements benchmark Score Score error Units
AVL tree 29.360 1.719 ms/op
JDK TreeSet 26.630 4.192 ms/op
Red-Black tree 45.811 4.770 ms/op
Scapegoat tree (alpha 0.6) 30.151 1.285 ms/op
Scapegoat tree (alpha 0.75) 34.001 2.171 ms/op
Scapegoat tree (alpha 0.9) 31.732 1.892 ms/op
Splay tree 91.072 5.093 ms/op
Treap 31.564 1.688 ms/op

This benchmark was specially created for the Splay tree. It is the same as previous search elements benchmark, just now we are constantly searching for only the same 100 elements. Splay tree “is splaying” those to the top after search and that should have given it a superior performance! However probably my benchmark or implementation is flawed in some way and Splay tree under performed!

Search same few elements benchmark Score Score error Units
AVL tree 2.301 0.192 ms/op
JDK TreeSet 2.667 0.182 ms/op
Red-Black tree 15.921 4.166 ms/op
Scapegoat tree (alpha 0.6) 1.993 0.192 ms/op
Scapegoat tree (alpha 0.75) 2.029 0.110 ms/op
Scapegoat tree (alpha 0.9) 1.870 0.196 ms/op
Splay tree 19.690 0.761 ms/op
Treap 7.887 0.392 ms/op

That is it. As always you can check out all source code from GitHub repo. It is pretty clear my Red-Black tree is not implemented correctly and is inefficient. And probably Splay tree too. It would be great if someone could take a look into it (and other implementations). I hope you liked this little benchmark and it did help you to choose your favorite binary search tree 🙂 Happy coding!

Advertisements

Sorting algorithms

My first post will hardly be original. In fact it is probably most unoriginal post ever 🙂 I will implement and compare several sorting algorithms in Java. I’m not going into deep details on how each algorithm works but will try to explain big picture and provide nice clean Java code and some illustrations or links to more information. Also I will try to microbenchmark some of those algorithms.

First algorithm – Insertion sort.

Its probably simplest (and most inefficient O(n^2)) algorithm of the bunch I will discuss. We start from second element and compare it with the first one. If it is smaller we swap them. Now we have the first one smaller than the second one. Then we take element #3 and compare with the second and first ones. If element #3 is smaller than #2 but bigger than #1 – its new place is between #1 and #2.  If element is smaller than both – its new place is as #1. You probably see where it is going. We iterate through the list and find each element its new place. This gif from wiki is very clear illustration of how it works:

And here is JAVA implementation of this algorithm with some comments:

    public static int[] insertionSort(int[] data) {
        // run from second element to last
        for (int i = 1; i < data.length; i++) {
            int key = data[i];
            int j = i - 1;
            // take element and run back until smaller is found
            while (j >= 0 && data[j] > key) {
                data[j + 1] = data[j]; // move elements to make place for insert
                j--;
            }
            // insert element in position after a smaller element
            data[j + 1] = key;
        }
        return data;
    }

Quicksort

This is probably most used sorting algorithm as in practice it tends to be the fastest one. It is divide and conquer algorithm which selects some element called pivot and moves all elements that are smaller before it and all the greater ones after. Then it takes those 2 sub lists and do the same (recursion). Now we have 4 sub lists and first one contains elements that are smaller then second one which contains elements that are smaller than third one which contains elements smaller than fourth one. We divide and sort until we reach single element and get all the list sorted! Here is the image worth a thousand words:

And JAVA code:

    public static int[] quickSort(int[] data) {
        Sorter.quickSort(data, 0, data.length - 1);
        return data;
    }

    private static void quickSort(int[] data, int sublistFirstIndex, int sublistLastIndex) {
        if (sublistFirstIndex < sublistLastIndex) {
            // move smaller elements before pivot and larger after
            int pivotIndex = partition(data, sublistFirstIndex, sublistLastIndex);
            // apply recursively to sub lists
            Sorter.quickSort(data, sublistFirstIndex, pivotIndex - 1);
            Sorter.quickSort(data, pivotIndex + 1, sublistLastIndex);
        }
    }

    private static int partition(int[] data, int sublistFirstIndex, int sublistLastIndex) {
         // differently from gif example we take last element of list as pivot
        int pivotElement = data[sublistLastIndex];
        int pivotIndex = sublistFirstIndex - 1;
        for (int i = sublistFirstIndex; i < sublistLastIndex; i++) {
            if (data[i] <= pivotElement) {
                pivotIndex++;
                ArrayUtils.swap(data, pivotIndex, i);
            }
        }
        ArrayUtils.swap(data, pivotIndex + 1, sublistLastIndex);
        return pivotIndex + 1; // return index of pivot element
    }

Merge sort

If quicksort was dividing list into smaller and smaller sub lists, merge sort takes opposite approach. It merges sorted sub lists into larger and larger list until it reaches the result of full sorted list. Smallest possible sub list is single element (which can be considered sorted). So we take first an second elements compare them and swap if second one is smaller than first one. Do same with #3 and #4, #5 and #6 etc. Now we have sorted sub lists each containing 2 elements. Do same with those. Sub list with now sorted elements #1 and #2 is merged with sub list with elements #3 and #4 and result is 4 element sorted sub list. And same operation is repeated until we have only single sorted list.

JAVA code:

    public static int[] mergeSort(int[] data) {
        mergeSort(data, 0, data.length - 1);
        return data;
    }
    private static void mergeSort(int[] data, int firstSublistStartIndex, int secondSublistEndIndex) {
        if (firstSublistStartIndex < secondSublistEndIndex) {
            int firstSublistLastElementIndex = (firstSublistStartIndex + secondSublistEndIndex) / 2;
            // recursively reach sub lists that has only one element
            Sorter.mergeSort(data, firstSublistStartIndex, firstSublistLastElementIndex);
            Sorter.mergeSort(data, firstSublistLastElementIndex + 1, secondSublistEndIndex);
            // then start merging them
            Sorter.merge(data, firstSublistStartIndex, firstSublistLastElementIndex, secondSublistEndIndex);
        }
    }
    private static void merge(int[] data, int firstSublistStartIndex, int firstSublistLastElementIndex,
            int secondSublistEndIndex) {
        int firstArrayLenght = firstSublistLastElementIndex - firstSublistStartIndex + 1;
        int secondArrayLenght = secondSublistEndIndex - firstSublistLastElementIndex;
        int[] firstArray = new int[firstArrayLenght + 1];
        int[] secondArray = new int[secondArrayLenght + 1];
        // copy first sub array to separate array
        for (int i = 0; i < firstArrayLenght; i++) {
            firstArray[i] = data[firstSublistStartIndex + i];
        }
        // copy second sub array to separate array
        for (int i = 0; i < secondArrayLenght; i++) {
            secondArray[i] = data[firstSublistLastElementIndex + 1 + i];
        }
        // optimization. because last element is maximum, we don't need to check
        // if current index is not out of bounds
        firstArray[firstArrayLenght] = Integer.MAX_VALUE;
        secondArray[secondArrayLenght] = Integer.MAX_VALUE;

        int firstCurrentIndex = 0;
        int secondCurrentIndex = 0;

        // merge firstArray and second array to data[p..r]
        for (int i = firstSublistStartIndex; i <= secondSublistEndIndex; i++) {
            // compare current values from sub arrays (both values are smallest
            // in each array) and move smaller one to data[p..r]
            if (firstArray[firstCurrentIndex] <= secondArray[secondCurrentIndex]) {
                data[i] = firstArray[firstCurrentIndex];
                firstCurrentIndex++;
            } else {
                data[i] = secondArray[secondCurrentIndex];
                secondCurrentIndex++;
            }
        }
    }

Heap sort

What would be naive sort algorithm that comes to mind as first idea to inexperienced programmer? I guess it would be something like run through the list, find smallest element and put it first, then run from second element to the end and again find smallest element and put it to second place etc – until we sort everything. This not very efficient algorithm (O(n^2) is called selection sort. Heap sort is more efficient variation of selection sort. It too selects smallest (largest) element and puts it to beginning (end) of the list. However to select smallest/largest element it uses data structure called heap which make algorithm much more efficient. Heap is binary tree-like data structure whose every node is always smaller(larger) than its children nodes. So in order to select smallest/largest element we just take it from the root node. The algorithm then basically is 2 step process. First heap is built from data. Simple array is enough for that (element #1 becomes root node, #2 and #3 its children, #4 and #5 elements becomes #2’s children etc…). Second step is remove smallest(largest) element from the root node of the heap and place it first(last). Of course after removal heap must be updated so its root is again smallest. Continue until all the list is sorted. Here is gif which hopefully make it more clear:

JAVA code:

    public static int[] heapSort(int[] data) {
        // build heap
        HeapUtils.buildMaxHeap(data);
        int heapSize = data.length;
        for (int i = data.length - 1; i > 0; i--) {
            // move largest element to the end
            ArrayUtils.swap(data, 0, i);
            // rebuild heap with smaller size by 1 (last element already sorted)
            HeapUtils.maxHeapify(data, 0, --heapSize);
        }
        return data;
    }
    public static void buildMaxHeap(int[] data) {
        for (int i = data.length / 2; i >= 0; i--) {
            maxHeapify(data, i);
        }
    }
    public static void maxHeapify(int[] data, int index) {
        maxHeapify(data, index, data.length);
    }
    // build heap structure &quot;in place&quot; where index is root node and heapSize is number of elements
    public static void maxHeapify(int[] data, int index, int heapSize) {
        int leftLeaf = getLeftLeaf(index);
        int rightLeaf = getRightLeaf(index);
        int largest = index;
        if (leftLeaf < heapSize && data[leftLeaf] > data[largest]) {
            largest = leftLeaf;
        }
        if (rightLeaf < heapSize && data[rightLeaf] > data[largest]) {
            largest = rightLeaf;
        }
        if (largest != index) {
            ArrayUtils.swap(data, index, largest);
            HeapUtils.maxHeapify(data, largest, heapSize);
        }
    }

OK time to do some benchmarks. For that I used JMH micro benchmark framework. I will simply measure sorting of 100 000 random integers. Of course you can clone the project from github (link at the end of the article) and try different variations. How algorithms act with less elements or with more? Is there a problem when array is already sorted (hint see quicksort)? However for now I will simply post results of 100000 random elements sorting. Here we go:

Benchmark Score Score error Units
Heap sort 20.789 0.657 ms/op
Insertion sort 2381.962 38.257 ms/op
Arrays.sort 11.203 0.218 ms/op
Merge sort 27.680 0.713 ms/op
Quicksort 12.726 0.207 ms/op

We could have expected somewhat similar results. Of course slowest was insertion sort. Fastest optimized and well implemented JDK sort algorithm. JDK sort algorithm is variation of quicksort so no surprise that custom quicksort implementation was just a little bit slower than JDK version. And then somewhat similar performance of heapsort and mergesort with heapsort being a bit faster.

Those are main and most popular sorting methods. There are various variations and hybrid methods as well. For example Timsort – hybrid of merge and insertion sort which is now used in JDK 7 for sorting non primitive types (take a look at source code of Arrays.java). And for primitive types it looks like JDK use some sophisticated quick sort algorithm with double pivots. Why is the difference? I don’t know, but I guess the reason is that merge sort (and Timsort) is stable algorithm meaning that after sorting even if elements were equal they still maintain same sequence as it was in input (for example sorting people by age, both John and Peter is 25 after sorting John still is before Peter like it was before sorting in input). Quicksort does not have this guarantee, but probably is faster in practice and though preferred method for primitive types. So we can see that library developers already is thinking for us and its probably best just to use standard tools and not reinvent the wheel unless you really know what you are doing and really can get better performance. So lets review algorithms and compare them.

Algorithm Average Worst case Best case Stable Memory Best use cases
Insertion sort O(n^2) O(n^2) O(n) Yes O(1) (in place) Quite efficient for small lists. Sometimes used in hybrid implementation to sort very small sub lists. It can also sort a list at the same time as it receives it.
Quicksort O(n log(n)) O(n log(n)) O(n log(n)) No O(log n) In practice tends to be faster algorithm than others.
Merge sort O(n log(n)) O(n log(n)) O(n log(n)) Yes O(n) With RAM based arrays tends to be slower than quicksort, but can be more efficient with slower media. Needs more memory, but is stable.
Heap sort O(n log(n)) O(n log(n)) O(n log(n)) No O(1) It’s performed in place and has very close performance to quicksort, so its good choice when memory is the issue.

OK so those were so called comparison sorting algorithms. And we can’t really do much better than that. Of course there are various improvements as I mentioned. For example quicksort with dual pivots (interesting read here) but still they are no better than O(n logn). But are we sure we can’t do better? And the answer is it depends! Under some circumstances we could. Imagine we know that we need to sort some numbers with range 0 to 99 and each number can appear only once. Yes we could use insertion sort or quicksort for this task. But if you think a bit – a much quicker version would be simply create a new boolean array with 100 elements and then just run through the input and set value in that helper array with key = input element to true. This way we simply need to print all the indexes that has it true in helper array. And that is very quick. That kind of sorting is called distribution sorting. It can be really fast but you need to know range of possible values of elements and of course need additional memory for a helper array. Lets review some distribution sorting algorithms.

Counting sort

This is slightly more sophisticated algorithm than the one I described above. The main difference is that we use integers instead of boolean in helper array and counting the number of element occurrences. This allows input to have equal elements because we are “counting” them.

JAVA CODE

    public static int[] countingSort(int[] data, int upperBound) {
        int dataLenght = data.length;
        int tempArrayLength = upperBound + 1;
        int[] tempArr = new int[tempArrayLength];
        // count how much of each element (index - element, value - count)
        for (int i = 0; i < dataLenght; i++) {
            tempArr[data[i]]++;
        }
        // count how much of less or equal elements exists
        for (int i = 1; i < tempArrayLength; i++) {
            tempArr[i] += tempArr[i - 1];
        }

        int[] sortedData = new int[dataLenght];
        // put each element to its place in sorted array
        for (int i = dataLenght - 1; i >= 0; i--) {
            int place = tempArr[data[i]] - 1; // element place in result array
            sortedData[place] = data[i];
            tempArr[data[i]]--; // decrease count less or equal elements count
        }

        return sortedData;
    }

Radix sort
In radix sort we are sorting by comparing individual digits from the last one to the first one. In essence radix sort is like this: sort elements by last digit. Then sort elements by second to last digit up till we reach first digit and after that all elements are in sorted order.
The sorting can be done by having buckets for each number and putting elements into each bucket (like in this explanation) or as in implementation provided here reuse counting sort.
As you can see for this method to work you need to know largest number of digits that elements in your data set might contain (or you will need to determine that dynamically which cost time too). Complexity of algorithm is O(kn) where k is number of digits and n is number of elements. If k is very big this sort might be not as efficient as even comparison sorts.

JAVA code for radix sort:

    public static int[] radixSort(int[] data, int numberOfDigits) {
        int[] sortedData = data;
        for (int i = 0; i < numberOfDigits; i++) {
            sortedData = Sorter.countingSortByDigit(sortedData, i, 9);
        }
        return sortedData;
    }
    private static int[] countingSortByDigit(int[] data, int digitIndex, int upperBound) {
        int dataLenght = data.length;
        int tempArrayLength = upperBound + 1;
        int[] tempArr = new int[tempArrayLength];
        // count how much of each element (index - element, value - count)
        for (int i = 0; i < dataLenght; i++) {
            int digitFromNumber = MathUtils.getDigitFromNumber(data[i], digitIndex);
            tempArr[digitFromNumber]++;
        }
        // count how much of less or equal elements exists
        for (int i = 1; i < tempArrayLength; i++) {
            tempArr[i] += tempArr[i - 1];
        }

        int[] sortedData = new int[dataLenght];
        // put each element to its place in sorted array
        for (int i = dataLenght - 1; i >= 0; i--) {
            int digitFromNumber = MathUtils.getDigitFromNumber(data[i], digitIndex);
            int place = tempArr[digitFromNumber] - 1; // element place in result
                                                      // array
            sortedData[place] = digitFromNumber;
            tempArr[digitFromNumber]--; // decrease count less or equal elements
                                        // count
        }

        return sortedData;
    }

Bucket sort
Bucket sort is method that is taking advantage that small lists are faster to sort than very big ones. So it simply splits input into “buckets” and then uses another sorting algorithm to sort those buckets. It is also possible to use bucket sort recursively too. Elements in each bucket is not overlapping so after each bucket is sorted they can be simply joined into the sorted result list. For example lets say we have 3 buckets and put into first elements 3 and 1, into second elements 5 and 8 and into third 11 9 and 12. After sorting each bucket and joining them into a list we get 1 3 5 8 9 11 12 sorted output. In fact bucket sort could be viewed as generalized version of counting sort. Take a look at explanation here The implementation of bucket sort here is efficient way to sort double numbers with range 0 to 1.

JAVA code:

   public static double[] bucketSort(double[] data) {
        int numberOfBuckets = data.length;
        double[][] buckets = new double[numberOfBuckets][];
        int[] bucketListIndices = new int[numberOfBuckets];
        for (int i = 0; i < data.length; i++) {
            int bucketIndex = (int) ((double) numberOfBuckets * data[i]);
            if (buckets[bucketIndex] == null) {
                buckets[bucketIndex] = new double[1];
            } else {
                // increase bucket size by 1 (maybe better redo with linked list)
                double[] bucket = buckets[bucketIndex];
                buckets[bucketIndex] = Arrays.copyOf(bucket, bucket.length + 1);
            }
            buckets[bucketIndex][bucketListIndices[i]++] = data[i];
        }

        double[] sortedData = new double[data.length];
        int nextIndex = 0;
        // sort all buckets with insertion sort and merge them into result.
        for (int bucketIndex = 0; bucketIndex < numberOfBuckets; bucketIndex++) {
            if (buckets[bucketIndex] != null) {
                int bucketLength = buckets[bucketIndex].length;
                System.arraycopy(sortedData, nextIndex, Sorter.insertionSort(buckets[bucketIndex]), 0, bucketLength);
                nextIndex = nextIndex + bucketLength - 1;
            }
        }

        return sortedData;
    }

So these are all algorithms I wanted to review here. Keep in mind those are only introductions here and my benchmarks are quite naive. Wikipedia is great starting point if you are interested into deeper understanding of the algorithms and learning all the subtleties and tricks that can improve performance. Hope you enjoyed this article!
You can find all the algorithms and benchmarks in the github project here.