Algorithms

This article is part of my personal wiki where I write personal notes while I am learning new technologies. You are welcome to use it for your own learning!

Big O Notation

The performance of most algorithms depends on the size of the input described by n. Algorithms can be classified from best to worse performance as follows:

  • O(log n) or logarithmic complexity: It means that its running time increases logarithmically in proportion to its input size (example for n=10 => log 10 = 1)
  • O(n) or linear complexity: Its running time increases in direct proportion to input size (example for n=10 => 10)
  • O(n*log n) or superlinear complexity: Between linear and polynomial (example for n=10 => 10)
  • O(n^c) or polynomial: Its running time grows quickly in relation to input size (example for O(n^2) and n=10 => 100)
  • O(c^n) or exponential: Its running time grows even faster than the polynomial algorithm (example for O(2^n) and n = 10 => 1024)
  • O(n!) or factorial: Its running time grows super fast and becomes unusable even for small n values (example for n=10 => 2628800) (sick!)

Sorting Algorithms

  • Linear Sorting: they treat the problem of sorting as one large problem
    • Bubble sort: (go through every pair of items, compare and swap if needed, repeat until no swaps)
    • Insertion sort (for each item in the array, INSERT and sort it within the first part of the array)
    • Selection sort (for each index in the array, SELECT the smallest item to the right and place it at that index)
  • Divide and Conquer: they treat the problem of sorting as many small problems
    • Merge sort (cut array into two sub arrays by the middle, sort subarrays, merge)
    • Quick sort (sub arrays based on value of a pivot, left all smaller, right all bigger, sort subarray, merge them)

Bubble sort:

Bubble sort is the simplest sorting algorithm and it consists on iterating over an array and swap items. Iterate until whole array is ordered (no swaps). In this manner, the biggest items bubble to the end of the array.

1
2
3
4
- while items are swapped
    - for each item
        - compare each item to its neighbor
        - if the right neighbor is smaller then swap it (right item to the left, left item to the right)

Complexity

You can measure the performance of any sorting algorithm by measuring the number of comparisons or the number of swaps.

  • Worst case scenario: O(n^2) (Bad for big unordered datasets)
  • Average case scenario: O(n^2) (Bad for big unordered datasets)
  • Best case scenario : O(n) (Good for sorted or nearly sorted datasets)
  • Space complexity: O(n) (Good memory profile, in-place algorithm, no extra allocations)

Insertion sort:

Insertion sort consists in sorting each item in the array as they are encountered. As the current item works from left to right we divide the array into two conceptual sub-arrays, to the left everything is sorted, to the right everything is unsorted. The current item, the one being iterated over, is inserted into place within the sorted section.

1
2
3
- for each item in the array
    - for each previous item in the array
        - if item is smaller than previous item swap it

In opposition to BubbleSort, insertion sort only needs only pass to sort an array.

Complexity

  • Worst case scenario: O(n^2) (Bad for big unordered datasets)
  • Average case scenario: O(n^2) (Bad for big unordered datasets)
  • Best case scenario: O(n) (sorted or nearly sorted)
  • Space complexity: O(n) (good memory profile, in-place algorithm, no extra allocations)

Selection sort

In selection sort, we sort the data by finding the smallest item and swapping it into the array in the first unsorted location. That is we:

1
2
3
1. Enumerate the array from the first unsorted item to the end
2. Identify smallest item
3. Swap smallest item with the first unsorted item 

It only requires n-1 swaps. In situations where moving a data elements is more expensive than comparing them, selection sort may perform better than other algorithms.

Complexity:

  • Worst case scenario: O(n^2) (not good for large unsorted data sets)
  • Average case scenario: O(n^2) (in practice it performs better than bubble but worse than insertion sort)
  • Best case scenario: O(n^2) (not good for large unsorted data)
  • Space complexity: O(n) (no extra allocations)

Merge Sort

In merge sort the array is recursively split in half. When the array is in groups of 1, it is reconstructed in sort order. Each reconstructed array is merged with the other half.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

1. divide in half
2. when array divided in subarrays of one element
3. reconstruct (merge) in sort order

[3,1,4,2]

// 1. divide
[2,1] [4,2]
// 2. divide
[3] [1] [4] [2]
// 3. reconstruct and sort
[1,3] [2,4]
// 4. reconstruct and sort
[1,2,3,4]

Complexity:

  • Worst case scenario: O(n logn) (appropriate for large data sets, it can be performed in parallel)
  • Average case scenario: O(n logn) (appropriate for large data sets, it can be performed in parallel)
  • Best case scenario: O(n logn) (even if sorted, they still needs to do the whole process, very predictable)
  • Space complexity: O(n)
    • Merge can be performed in place but often is not. Extra allocations of arrays increase the memory footprint required to sort the data

Quick sort

The quick sort algorithm consists in picking a pivot value within the array and partitioning the array based on the pivot. All values smaller than the pivot are placed to the left of the pivot and all values that are bigger are moved to the right of the pivot. The algorithms consits on applying this pivot selection and partition process until the array is fully sorted.

1
2
3
4
5
6
7
8
9
10
11
12
13

[5,3,1,4,2]

// pick 4 as pivot
[5,3,1,*4*,2]
// move elements so everything to the left is smaller than 4, everything to the right is larger than 4
[3,1,2,*4*,5]
// now we continue with the [3,1,2] subarray and pick 2 as pivot
[3,1,*2*]
[1,*2*,3]
// continue with the [5] subarray (it's sorted!)
// in fact all the array is sorted!!
[1,2,3,4,5]

Complexity:

  • Worst case scenario: O(n^2)
    • inversely sorted
  • Average case scenario: O(n logn)
  • Best case scenario: O(n logn)
  • Space complexity: O(n)
    • As a recursive algorithm additional space is used in the call stack

References

Comments