I am a programmer and have just started reading Algorithms. I am not completely convinced with the notations namely Bog Oh, Big Omega and Big Theta. The reason is by definition of Big Oh, it states that there should be a function g(x) such that it is always greater than or equal to f(x). Or f(x) <= c.n for all values of n >n0.
Why don’t we mention the constant value in the definition? For example, let’s say a function 6n+4, we denote it as O(n). But it’s not true that the definition holds good for all constant value. This holds good only when c >= 10 and n >= 1. For lesser values of c than 6, the value of n0 increases. So why we do not mention the constant value as a part of the definition?
2
There are several reasons, but probably the most important one is that constants are a function of the implementation of the algorithm, not the algorithm itself. The order of an algorithm is useful for comparing algorithms irrespective of their implementation.
The actual runtime of a quicksort will typically change if it’s implemented in C or Python or Scala or Postscript. The same applies for bubble sort — the runtime will vary widely based on implementation.
However, what will not change is the fact that, all else being equal, as the dataset gets larger the time required to run a bubble sort will increase faster than the time required to run quicksort in the typical case, no matter what language or machine they’re implemented with, assuming a reasonably correct implementation. This simple fact allows you to make intelligent inferences about the algorithms themselves when concrete details are not availble.
The order of an algorithm filters out factors that, while important in actual real-world measurements, tend to just be noise when comparing algorithms in the abstract.
O(n) and other order notation is (typically) not concerned with the behavior of functions for small values. It is concerned with the behavior of functions for very large values, namely limits as n moves toward infinity.
The constants technically matter but they are typically abstracted away as once n becomes large enough, c’s value is entirely irrelevant. If c’s value is important we can include it in the analysis but unless the functions being compared have very large constant factors or if the efficiency is an especially important concern, they typically are not.
2
Big O notation as per definition states that:
For a given function g(n), we denote by O(g(n)) the set of functions:
O(g(n)) = {f(n): there exist positive constants c and n' such that 0<=f(n)<=c.g(n) for all n > n'}
The Big O notation is built on the intuition that for all values n at and to the right of n’, the value of f(n) is on or below c.g(n).
The constants also don’t matter when you go to high valued (variable) factors (like n-square or n-cube) since they are just the constants and not the varying quantities which can become as big as those factors.
Given below is the graph of Big-O notation.
The essence of this notation is in the fact “how lower is f(n) from c.g(n) and not when it starts becoming lower
“.
2
In algorithm analysis, Order of Growth is the key abstraction and it gives the rate at which the running time changes as the input size changes. Let’s say an algorithm has a running time f(n) = 2n + 3
. Now we plug in some input size,
n = 10: 2 * 10 + 3 = 23
n = 100: 2 * 100 + 3 = 203
n = 10000: 2 * 10000 + 3 = 20003
n = 1000000: 2 * 1000000 + 3 = 2000003
n = 100000000 : 2 * 100000000 + 3 = 200000003
As can be seen, order of growth is mainly determined by the variable n
; constants 2 and 3 are less significant and as the input size grows they become even less significant in determining it. This is why in algorithm analysis constants are subsided in favor of the variable determining the order of growth of a function.
(since this is a longer answer, read the bolds for a summary)
Let’s take your example and walk through it step-by-step, understanding the purpose behind what we are doing. We start with your function and the goal of finding its Big Oh notation:
f(n) = 6n+4
First, let O(g(n))
be the Big Oh notation we are trying to find for f(n)
. From the definition of Big Oh, we need to find a simplified g(n)
where there exists some constants c
and n0
where c*g(n) >= f(n)
is true for all n
‘s greater than n0
.
First, let’s choose g(n) = 6n + 4
(which would yield O(6n+4)
in Big Oh). In this case we see that c = 1
and any value of n0
will meet the mathematical requirements from our definition of Big Oh, since g(n)
always equals f(n)
:
c*g(n) >= f(n)
1*(6n + 4) >= 6n + 4 //True for all n's, so we don't need to pick an n0
At this point we’ve met the mathematical requirements. If we stopped at O(6n+4)
, it’s clear that this is no more helpful than writing f(n)
, so it would miss the true purpose of Big Oh notation: to understand the general time-complexity of an algorithm! Thus, let’s move on to the next step: simplification.
First, can we simplify out of the 6n
so the Big Oh is O(4)
? No! (Exercise for the reader if they don’t understand why)
Second, can we simplify out the 4
so that the Big Oh is O(6n)
? Yes! In that case, g(n) = 6n
, so:
c*g(n) >= f(n)
c*6n >= 6n + 4
At this point, let’s choose c = 2
since then the left side will increase faster (by 12) than the right side (by 6) for each increment of n
.
2*6n >= 6n + 4
Now we need to find a positive n0
where the above equation is true for all n
‘s greater than that value. Since we already know that the left side is increasing faster than the right, all we have to do is find one positive solution. Thus, since n0 = 2
makes the above true, we know that g(n)=6n
, or O(6n)
is a potential Big Oh notation for f(n)
.
Now, can we simplify out the 6
so that the Big Oh is O(n)
? Yes! In that case, g(n) = n
, so:
c*g(n) >= f(n)
c*n >= 6n + 4
Let’s pick c = 7
since the left would increase faster than the right.
7*n >= 6n + 4
We see that the above will be true for all n
‘s greater than or equal to n0 = 4
. Thus, O(n)
is a potential Big Oh notation for f(n)
. Can we simplify g(n)
anymore? Nope!
Finally, we’ve found that the simplest Big Oh notation for f(n)
is O(n)
. Why did we go through all this? Because now we know that f(n)
is linear, since it’s Big Oh notation is of linear complexity O(n)
. The nice thing is that now we can compare the time-complexity of f(n)
to other algorithms! For example, we now know that f(n)
is of comparable time-complexity to the functions h(n) = 123n + 72
, i(n) = n
, j(n) = .0002n + 1234
, etc; because using the same simplification process outlined above they all have linear time-complexity of O(n)
.
Sweet!!!
5
The whole notion of Big-Oh notation is specifically to ignore constants and present the most important part of the function describing the run-time of an algorithm.
Forget the formal definition for a moment. Which is the worse (faster growing) function, n^2 - 5000
or 5000 n + 60000
? For n
less than around 5000, the linear function is greater (and thus worse). Beyond that (exact value 5013?), the quadratic equation is bigger.
Since there are more (quite a few more) positive numbers greater than 5000 than less, we take the quadratic to be the ‘bigger’ (worse) function in general. Order notation (Big-Oh etc) enforces that (you can always eliminate an additive and a multiplicative constant using those definitions).
Of course, things aren’t always simple. Sometimes you do want to know those constants. Which is better Insertion sort or Bubble sort? Both are O(n^2)
. But one really is better than the other. With more elaborate analysis one can get constants like you are wondering about. It’s usually much easier to compute the Big-Oh function than a more exact function.
Big-Oh ignores these constants to simplify and make the most important comparisons easier. We like the notation because usually we don’t want to know about the (mostly irrelevant) constants.
If you have a performance function of 6n + 4
, the relevant question is, “6 what?”. As one comment asked: what does your constant represent? In physics terms, what are the units of your constant factor?
The reason why O() notation is so widely used to describe algorithm performance is that there is no portable way to answer that question. Different processors will take a different number of clock cycles and different amounts of time to perform the same elementary computation, or they may lump the relevant elementary computations differently. Different computer languages, or different formal and informal descriptions like pseudocode, will represent algorithms in ways that are difficult to compare directly. Even implementations in the same language can represent the same algorithm in different ways — trivial formatting details like the number of lines aside, you will generally have a wide variety of arbitrary structural choices to implement any given algorithm.
Look at it another way: we use “algorithm” not to describe a particular implementation, but to describe an entire class of potential implementations of the same general procedure. This abstraction ignores the details of implemention in favor of documenting something of general value, and the constant performance factor is one of these details.
That said, algorithm descriptions are often accompanied by folklore, notes, or even actual benchmarks that describe the performance of actual implementations on actual hardware. This gives you a rough idea what kind of constant factor to expect, but it should also be taken with a grain of salt because the actual performance depends on things like how much work went into optimization of a given implementation. Also, in the long term, the relative performance of comparable algorithms tends to drift as the architecture of the latest and greatest processors changes…