Sorting algorithms
When sorting data structures, there are many possible approaches we can take. Let’s take a look at some of the most used options and compare them.
Bubble sort
Bubble sort iterates through the data structure and compares one pair of values at a time. If the order of those values is incorrect, it swaps its positions to correct it. The iteration is repeated until the data is ordered. This algorithm makes bigger values to “bubble” to the end of the array. This algorithm has a cuadratic (O(n²)) complexity since it will compare each value with the rest of the values one time.
A possible implementation could be the following:
Selection sort
Selection sort is similar to bubble sort but instead of placing the bigger values at the end of the data structure, it focuses on placing the smaller values at the begining. The steps it takes are the following:
- Store the first item of the data structure as the minimum value.
- Iterate through the data structure comparing each value with the minimum value. If a smaller value is found, it identifies this value as the new minimum value.
- If the minimum value isn’t the first value of the data structure, it swaps the positions of the minimum value and the first value.
- It repeats this iteration until the data structure is ordered.
This algorithm has a cuadratic (O(n²)) complexity.
A possible implementation could be the following:
Insertion sort
Insertion sort orders the data structure by creating an “ordered half” that is always correctly sorted, and iterates through the data structure picking each value and inserting it in the ordered half exactly in the place it should be.
The steps it takes are the following:
- It starts by picking the second element in the data structure.
- It compares this element with the one before it and swap its positions if necessary.
- It continues to the next element and if it’s not in it’s right position, it iterates through the “ordered half” to find it’s right position and inserts it there.
- It repeats the same process until the data structure is sorted.
This algorithm has a cuadratic (O(n²)) complexity.
A possible implementation could be the following:
The problem with bubble sort, selection sort and insertion sort is that these algortihms don’t scale well. There’re much better options we can take when we’ll work with big datasets. Some of them are merge sort, quick sort and radix sort.
Merge sort
Merge sort is an algorithm that recursively decomposes the data structure into individual values, and then composes it again in a sorted way.
The steps it takes are the following:
- Recursively break up the data structure into halves until each “piece” has only one value.
- Then, recursively merge the pieces in a sorted way until it gets back to the lenght of the original data structure.
This algorithm has a O(n log n) complexity, since the decomposition part of it has a complexity of log n and the comparison part of it has a complexity of n.
A possible implementation could be the following:
Quick sort
Quick sort works by selecting one element (called “the pivot”) and finding the index where the pivot should end up in the sorted array. The runtime of quicksort depends in part on how the pivot is selected. Ideally, it should be roughly the median value of the dataset being sorted.
The steps the algorithm takes are the following:
- Identify the pivot value and place it in the index it should be.
- Recursively execute the same process on each “half” of the data structure.
This algorithm has a O(n log n) complexity.
A possible implementation could be the following:
Radix sort
Radix is an algorithm that works in a different way than the ones seen before, in the sense that it doesn’t compare values between each other. Radix is used to sort lists of numbers and to do so it exploits the fact that the size of a number is defined but the ammount of digits it has (the most digits, the bigger the number). What radix does is to sort values by it’s digits in order. It first sorts all values by it’s first digit, then again by its second, then by its third… This process is repeated as many times as the number of digits the biggest number in the list has. And by the end of this process, the algorithm returns the fully sorted list.
The steps it takes are the following:
- Figure how many digits the largest number has.
- Loop through the list up to the largest number of digits. In every iteration:
- Create “buckets” for each digit (from 0 to 9) and place each value in its corresponding bucket according to the digit being evaluated.
- Replace the existing list with the values sorted in the buckets, starting from 0 and going up to 9.
This algorithm has a O(n*k) complexity, being k the number of digits the largest number has. Given that it doesn’t compare values with each other this algorithm has a better runtime than the ones seen before, but will only work on lists of numbers. If we want a data agnostic sorting algorithm, we would probably go with any of the previous ones.
A possible implementation could be the following: