Big O Notation, How it Works and What It's Used For

We explain several Big O notations and illustrate them by example

Posted by Stacy on 25-08-2018

By reading scientific or technical articles, you probably noticed that algorithms are often compared by using notations like \(O(n)\) or \(O(log\:n)\). These are examples of the famous Big O Notation. What exactly does this mean and how do different notations compare? Let's look at the three most common Big O notations and compare them.

Let's start with a definition of what an algorithm is.

Algorithm

An algorithm is a formal prescription of simple operations a computer (or a human) can easily follow to solve a problem. For a given problem, multiple solutions can exist, so multiple algorithms can be developed to solve one problem.

Which begs the question: if there could be multiple different algorithms capable of solving a problem, how can we find out which one is better? (A better algorithm in computer science is also called more efficient.)

To represent the efficiency of an algorithm, computer scientists invented the Big O notation: \(O(n)\), \(O(1)\), \(O(log\:n)\) are all examples of Big O notation and can be used to compare different algorithms that solve the same problem:

  • \(O(1)\) means that the algorithm takes constant time to solve the problem (the execution time is independent of the problem size);

  • \(O(n)\) means that algorithm takes linear time to solve the problem;

  • \(O(log\:n)\) means that the algorithm takes a logarithmic time to solve the problem.

To understand the Big O notation, one has to understand first what a constant time execution, linear time execution, and logarithmic time execution mean.

Let's look more closely at each of the three most common Big O notations.

O(n)

\(O(n)\) means that the algorithm takes linear time to solve the problem.

Consider the following problem. We are given a box with 100 random numbers, for example, \((10, 18, 3, 15,..., 197)\) and we are asked if the number \(18\) is in the box. How can we solve this problem?

We can decide to pick one number at a time out of the box and compare it to \(18\). If the number we just picked is the one we look for, then we are good and the algorithm can stop. Otherwise, we have to pick another number, compare it to \(18\), return True if the number we picked is indeed \(18\), or continue otherwise. In the worst case we will need to pick all \(100\) numbers to find out that either the last picked number is indeed \(18\) or that \(18\) is not in the box. So in the worst case, we will need to execute \(100\) unitary operations. If the box contains \(1000\) numbers we will need, in the worst case, do \(1000\) unitary operations to solve the problem. So the number of unitary operations need to solve the problem is linear in the size of the problem (the size of our box). In this case, we say that the execution time of our algorithm is \(O(n)\), meaning the time is linear in size of the problem.

Here's how our algorithm looks like in Python:

numbers = [5, 14, 123, 4, 76, 33, 75, 18, 9, 7];
number_to_find = 18;
def find(number_to_find, numbers):
    for number in numbers:
        if number = number_to_find:
            return True #found

return False; #not found

O(1)

\(O(1)\) means that the algorithm takes constant time to solve the problem, that is the execution time is independent of the size of the problem.

Consider the following problem. One has a box with \(20\) random numbers but now the values of the numbers are between \(1\) and \(20\), the box has \(20\) sections enumerated from \(1\) to \(20\) and each number is placed in the corresponding section. The question you have to answer is the same: is the number \(18\) in the box? Now, to answer the question, you can follow a different algorithm: open the box, look at the section with number \(18\); if the section contains a number, then return \(True\); otherwise, return \(False\). In this case the time it takes to answer the question is independent of the size of the box, in other words, the time is constant. Then the following notation describes the execution time of our algorithm: \(O(1)\).

The Python code for our algorithm woud look like this:

numbers = [1,None,3,4,5,None,None,None,9,10,11,None,None,14,None,16,17,18,None,20]
def find(number_to_find, numbers):
    return numbers[number_to_find - 1] != None

O(log n)

Now consider the previous problem, but with a slight modification: the box doesn't come with sections for every number, but the numbers are still sorted in the increasing order. The question is the same: whether the box contains the number \(18\) or not. One possible algorithm that we could propose to solve this problem is like this:

  1. Split the box into two halves and look at the left-most element at each half. If the left-most element of some half is bigger than \(18\) than we know that the whole half doesn't contain \(18\). Now, look at the right-most element at each half. If the right-most element of some half is smaller than \(18\) than we know that the whole half doesn't contain \(18\). Discard the halves that don't contain \(18\). If both halves were discarded, then return \(False\), otherwise, keep examining the remaining part of the box.
  2. Split the remaining part of the box into two halves and look at the left-most element at each half. If the left-most element of some half is bigger than \(18\) than we know that the whole half doesn't contain \(18\). Now, look at the right-most element at each half. If the right-most element of some half is smaller than \(18\) than we know that the whole half doesn't contain \(18\). Discard the halves that don't contain \(18\). If both halves were discarded, then return \(False\), otherwise, keep examining the remaining part of the box.
  3. Repeat the Step 2 until the remaining part of the box contains just one element. If it's \(18\) then return True otherwise return \(False\).

This is how it would look like for a small example:

An example of an O(log n) algorithm.
An example of an \(O(log\:n)\) algorithm.

By looking at the above picture, it's easy to see that for the box size of \(16\) it took \(4\) steps to find out whether \(18\) is in the box.

In math, if \(n = 2^a\) then \(log_2 n = a.\)

Hence we can rewrite \(16 = 2^4\) as \(log_2 16 = 4.\)

So, the time (the number of steps) our algorithm takes to solve the problem (in the worst case) is \(log_2 n\), where \(n\) is the size of the box (the size of our problem).

Using the Big-O notation, it becomes \(O(log\:n)\).

This is how our \(O(log\:n)\) algorithm to check if a number is in the box would look like in Python:

    def find(number_to_find, numbers):
        left = 0
        right = len(numbers) - 1
        while left <= right:
            mid = left + (right - left) / 2
            if number_to_find < numbers[mid]:
                right = mid - 1
          elif number_to_find > numbers[mid]:
                left = mid + 1
          else:
            return True #number found
        return False #number not found

All algorithms that halve the size of the problem at every iteration have the time complexity of \(O(log\:n)\).

In this post, we explained in simple terms the Big-O notation and gave examples of three algorithms that have time complexity expressed in Big-O notations as \(O(n)\), \(O(1)\), \(O(log\:n)\).


Read our previous post "What Machine Learning Can Do" or subscribe to our RSS feed.

Found a mistyping or an inconsistency in the text? Let us know and we will improve it.


Like it? Share it!