Landau Notation: A user's guide
Big O notation, and it’s cousin little o, probably occupy a fuzzy space in the minds of most engineers – a thing you’ve heard about and seen used; maybe you can even write down and interpret equations with them. Still, it’s not something that you get. There’s still some hesitancy in the imprecision which seems baked into the idea of “approximating” a function – like your cheating someone and hoping you don’t get caught. I think this has a lot to due with the ad-hoc way it’s taught. My first experiences with seeing Landau notation in engineering school was that the professor would write down something on the board:
\begin{equation}\label{first eq} y = e^{-x}\sin(x) + O\left(\frac{1}{x}\right) \end{equation}
Upon writing this, a confused silence would fill the classroom. Where did this $O$ thing come from? Was it a function defined earlier in the lecture (or earlier in the semester), that the students, half-asleep, had forgotten about? Eventually, some student would ask what was going on with the last term. Oh, that term, the professor would say, is the term we don’t care about. And, to further allay the our fears, It is small. The professor would then continue with the lecture, always discussing more the “important” term $e^{-x}\sin(x)$, and letting $O\left(\frac{1}{x}\right)$ be.
The overall effect is that students leave the lecture questioning what just happened. How can a term just be small (read: unimportant)? Aren’t all terms needed to make an equality, and so all are equally important? Also, is $O$ a function or what? Things seem too wishy-washy for what actual math should be. In the example above, I used a real-variable $x$, but computer scientists can probably also relate to similar experiences in analyzing algorithmic complexity. (“So the program takes $O(n^2)$ long to execute, but not exactly?”)
In this blog, then, I’d like to spend some time getting to know this notation. I’ll talk not only about the definition, but also what it means, in a few different interpretations, so that we can build up the intuition to manipulate expressions with with Landau notation (that’s the collective name for big O and little o), with confidence. Finally, I’ll show collect equivalent definitions of big O and little o, and prove their equivalence.
Let’s start with a list of what Landau notation is, and a longer list of what Landau notation isn’t.
Landau notation is:
- A way to compare the limiting behavior of functions.
- A way to describe the growth or decay of a function.
Landau notation isn’t:
- A function.
- A measure of algorithmic complexity (exclusively).
- A measure of how functions grow or diminish as the input goes to infinity (exclusively).
- A way to put exact bounds on a function.
- an equality.
That last one might be shocking, but as I’ll show, it’s inarguably true. To do that though, we have to get into the nitty-gritty. Let’s jump in.
The (first) definition
Definition # 1. [Big O] Let $f(x)$ be a function defined on some subset of the real (or complex) numbers, and $g(x)$ be a “comparison function” defined on the same domain. If there is a constant $C$ such that
\[|f(x)| \leq C|g(x)|, \text{ for all $x$},\]then we write
\[f(x) = O(g(x))\]read, “$f(x)$ is big O of $g(x)$.”
Example. $\sin(x)x^2 = O(x^2)$ because we can chose $C=1$ and
\[|sin(x)x^2| \leq |\sin(x)||x^2| \leq 1\cdot|x^2|.\]Example. Consider $n$ to be an natural number greater than or equal to 1. Then $n^2 = O(n^3)$ with $C=1$ since
\[|n^2| = 1\cdot |n^2| \leq |n||n^2| = |n^3|.\]Well, so far this seems pretty easy, but one ingredient is still missing. Big O is normally used to describe the limiting behavior of a function, normally going to $0$ or $\infty$, but the definition shown above has nothing to do with limits! It only states that the absolute value of the function has an upper bound everywhere. To get the effect of limiting, we place a bound for where we consider the inequality to hold. Just like how $C$ is not something we care to specify, the exact limits of the bound aren’t that important; we only care that they exist. Now, let’s provide the big O definition with a limiting value of $0$ and $\infty$.
Definition #2. [Big O as $x\to\infty$] Let $f(x)$ be a function defined on an subset of the real (or complex) numbers, unbounded above, and $g(x)$ be a “comparison function” defined on the same domain. If there exists constants $x_0$ and $C$ such that
\[|f(x)| \leq C|g(x)| \text{ for all $x \geq x_0$},\]then we write
\[|f(x)| = O(g(x)) \text{ as } x\to\infty.\]Definition #3. [Big O as $x\to 0$] Let $f(x)$ be a function defined on a subset of the real numbers, and $g(x)$ be a “comparison function” defined on the same domain. If there exists constants $x_0$ and $C$ such that
\[|f(x)| \leq C|g(x)| \text{ for all $|x| \geq x_0$}\]then we write
\[|f(x)| = O(g(x)) \text{ as } x\to 0.\]Here, “as $x\to\infty$” or “as $x\to 0$” are qualifier statements: they describe the domain where the inequality holds. You can also see that we describe this as the “limiting behavior”, but no limits are actually used. That is to say, if $x_0 = 1$ (in either definition), there’s no reason to keep moving closer to $0$ or $\infty$. What is true, is that repeated unions of bounds produces another bound of the same form. That rule will be important when we discuss the algebra of Landau notation.
Example. $1 + x^2 = O(x^2)$ as $x\to \infty$. Take $x_0 = 1$, $C = 2$. Then,
\[|1 + x^2| = x_0 + x^2 \leq x + x^2 \leq x^2 + x^2 = 2x^2 = 2|x^2|.\]But $1 + x^2 \neq O(x^2)$ as $x\to 0$. To see this, assume for contraction that two such $x_0$ and $C$ exist. Then we take $x=\frac{x_0}{\sqrt{1+Cx_0^2}}$, which, $C$ being greater or equal to $0$, is less than $x_0$. The contradiction forms as
\[1 + x^2 = 1 + \left(\frac{x_0}{\sqrt{1+Cx_0^2}}\right)^2 = 1 + \frac{x_0^2}{1+Cx_0^2} = \frac{1 + Cx_0^2}{1+Cx_0^2} + \frac{x_0^2}{1+Cx_0^2} = \frac{1 + (C+1)x_0^2}{1+Cx_0^2} > \frac{(C+1)x_0^2}{1+Cx_0^2} > C\frac{x_0^2}{1+Cx_0^2} = Cx^2\]Often, the qualifier is dropped when the when the interpretation is ‘implied’. This is no doubt the cause of some of the confusion regarding the notation. In general, when analyzing algorithm growth, “as $x\to\infty$” is meant, whereas when expanding Taylor series, “as $x\to 0$” is meant. If you’re still confused which one you are dealing with in any situtation, ask yourself which is true: $x=O(x^2)$ or $x^2 = O(x)$. The first, $x=O(x^2)$, is true as $x\to\infty$, while the second, $x^2 = O(x)$, is true as $x\to 0$.
Let’s move now to little o. Just like big O, it has different definitions as a limit to $\infty$ and as a limit to $0$. I’ll go in the opposite order as I did for big O, first showing the definition for $\infty$, then $0$, then uniting the two.
Definition #4. [Little o as $x\to \infty$] Let $f(x)$ be a function defined on an unbounded subset of the real (or complex) numbers, and $g(x)$ be a “comparison function” defined on the same domain. If, for every $\epsilon > 0$, there is some $x_0$ so that
\[|f(x)| \leq \epsilon|g(x)| \text{ for all } x \geq x_0\]then we write
\[f(x) = o(g(x)) \text{ as } x\to\infty.\]read, “$f(x)$ is little o of $g(x)$ as $x$ goes to $\infty$.”
Definition #5. [Little o as $x\to 0$] Let $f(x)$ be a function defined on a subset of the real (or complex) numbers, and $g(x)$ be a “comparison function” defined on the same domain. If, for every $\epsilon > 0$, there is some $\delta > 0$ such that
\[f(x) < \epsilon g(x) \text{ for all $x$ with $|x| < \delta$}\]then we write
\[f(x) = o(g(x)) \text { as } x\to 0.\]What’s different here? We’ve replaced the value $C$, which must satisfy the inequality for only one value, with $\epsilon$, which must satisfy the inequality for all positive values. The second value, $x_0$ or $\delta$ depending, depends on $\epsilon$, so that as $\epsilon$ gets smaller, $x_0$ would get bigger in the first definition and $\delta$ would get smaller in the second definition. But what’s great is that unlike big O, these are actual limits! To the actual values of $\infty$ and $0$. So in general we can write the definition of little o as:
Definition #6. [Little o] For a chosen $a$, $f(x) = o(g(x))$ as $x\to a$, if
\[\lim_{x\to a}\left|\frac{f(x)}{g(x)}\right| = 0.\]Where $a$ can be any real number or $\pm\infty$.
I’ll show how this is true in slow-motion in the final section.
The interpretation here is that big O acts as “less than or equal to” and little o acts as “strictly less than”, in terms of order of growth (or decay). We’ll now include one more definition, which is more of a notation than a definition.
Definition #6. [Notation] If $f(x) - g(x) = O(h(x))$, then we can also write $f(x) = g(x) + O(h(x))$. Likewise, if $f(x) - g(x) = o(h(x))$, then we can also write $f(x) = g(x) + o(h(x))$.
We now have something that looks like eqn. \eqref{first eq}, and can answer why it isn’t in fact, an equality. From the very beginning, the equality was only notational; In effect, defn. #1 doesn’t define $O(\cdot)$, but “$=O(\cdot)$”. Read allowed, we say f(x) is O(g(x)), not f(x) equals O(g(x)). [1] describes the sense of this “is” as in the sentence “A bird is an animal.” That does not mean “bird = animal”, but that “bird” is in the class of “animal.” (See also that textbook for a longer discussion about the history of the notation, and why we’re stuck with this abuse of notation.) Likewise, $f(x)= O(g(x))$ is in the class of functions which behave asymtotically, like (or less than) $g(x)$.
Manipulating Landau notation
I wouldn’t blame you if you got a slight headache looking at the proof that $1 + x^2 \neq O(x^2)$. It’s a mess, and isn’t the point of notation like this to make our life easier, not harder? The real power of Landau notation is that it allows us to manipulate these asymptotic expression a lot like algebraic terms, while making our life easier – because we can suppress those limiting values, $x_0$ and $C$.
From the definition, it can be seen that the following hold for big O:
\begin{equation}\label{big O rule 1} f(x) = g(x) \implies f(x) = O(g(x)) \quad \text{(or $f(x) = O(f(x))$)} \end{equation}
\begin{equation}\label{big O rule 2} f(x) = O(g(x)) \text{ and } g(x) = O(h(x)) \implies f(x) = O(h(x)) \end{equation}
and the following holds for little o:
\begin{equation}\label{little o rule 1} f(x) = o(g(x)) \text{ and } g(x) = o(h(x)) \implies f(x) = o(h(x)) \text{ as } x \to a. \end{equation}
This draws out the comparison of big O to $\leq$ and little o to $<$. Eqn. \eqref{big O rule 1} corresponds to the rule that $x \leq x$, and eqns. \eqref{big O rule 2} and \eqref{little o rule 1} correspond the transitive property of inequalities:
\[x < y \text{ and } y < z \implies x < z\] \[x \leq y \text{ and } y \leq z \implies x \leq z\]Next, I’d like to introduce two more rules which follow from the definition, and will be important when simplifying the order of a complicated function:
\begin{equation}\label{big O rule 3} f(x) = O(ag(x)) \implies f(x) = O(g(x)) \end{equation}
\begin{equation}\label{big O rule 4} f(x) = O(g(x)) \text{ and } h(x) = O(k(x)) \implies f(x)h(x) = O(g(x)h(x)) \end{equation}
And likewise for little o:
\begin{equation}\label{little o rule 2} f(x) = o(ag(x)) \implies f(x) = o(g(x)) \text{ as } x \to a \end{equation}
\begin{equation}\label{big o rule 3} f(x) = o(g(x)) \text{ and } h(x) = o(k(x)) \implies f(x)h(x) = o(g(x)h(x)) \text{ as } x \to a. \end{equation}
Up to this point, I’ve been deliberately abstract in keeping our functions as the generic $f(x)$ and $g(x)$. In real examples, $g(x)$ is some standard function like $x^n$, $\log(x)$, or $e^x$. These standard functions form an “ordering,” where we can bound the growth of one of these functions below the growth of another:
\[\log\log x = o(\log x) \text{ as } x\to\infty\] \[\log x = o(x) \text{ as } x\to\infty\] \[x^n = o(x^{n+1}) \text{ as } x\to\infty\] \[x^n = o(e^x) \text{ as } x\to\infty.\]Typically the $x\to 0$ case only considers polynomials, so the essential rule is:
\[x^n = o(x^{n-1}) \text{ as } x\to 0.\]All of the above also hold for $O$, since it is a weaker condition. With just these rules, we can work examples to analyze the order of a function.
Example. Let $f(x) = \frac{4x^3 + 3x}{x^2 + 1}$. We wish to determine its $O$-order in terms of a monomial as $x\to \infty$. First, notice that for $x>1$, $4x^3 + 3x < 7x^3$, and for $x > 0$, $\frac{1}{x^2 + 1} < \frac{1}{x^2}$. Since we are restricting from below by any value, $1$ and $0$ here, is allowed. By eqn. \eqref{big O rule 4}, we get
\[\frac{4x^3 + 3x}{x^2 + 1} = O\left(7x^3\frac{1}{x^2}\right) \text{ as } x\to\infty\] \[\frac{4x^3 + 3x}{x^2 + 1} = O\left(7x\right) \text{ as } x\to\infty\]And by eqn. \eqref{big O rule 3},
\[\frac{4x^3 + 3x}{x^2 + 1} = O\left(x\right) \text{ as } x\to\infty\]Of course, it’s also true that $f(x) = x^2$, but when analyzing function asymptotics, we want the strongest possible bound, at least if we can get it. After only a few examples, you’ll notice some shortcuts. I’ve listed some important ones in the lemmata below:
Lemma. If $f(x) = O(g(x))$ and $h(x) = O(k(x))$, and also $g(x) = O(k(x))$, then
\begin{equation} f(x) + g(x) = O(k(x)). \end{equation}
The next lemma specifies the previous lemma to polynomials.
Lemma. If a polynomial $p(x) = \sum_{i=l}^m c_ix^i$, then
\[p(x) = O(x^l) \text{ as } x \to 0\]and
\[p(x) = O(x^m) \text{ as } x \to \infty.\]Lemma. If $p(x) = \sum_{i=l}^m c_ix^i$ is a polynomial and $q(x) = \sum_{i=n}^p c_ix^i,$, then
\[\frac{p(x)}{q(x)} = O(x^{l-n}) \text{ as } x \to 0\]and
\[\frac{p(x)}{q(x)} = O(x^{m-p}) \text{ as } x \to \infty.\]Some other definitions and interpretations
The goal of this section is to provide some alternative and equivalent definitions of Landau notation. These definitions are all equivalent, so, following mathematical convention, there’s no reason to chose which one is “the most true.”
First, I promised to explain how the definitions for little o as $x\to\infty$ and $x\to 0$ are both limits and cases of defn. #6. For this, we will need to recall the definition of the limit of a function approaching a real number, and the definition of the limit of a function approaching a infinity.
Definition #7. [Real limit] Let $f(x)$ be a function on the real numbers, and $a$ and $L$ be real numbers. If for every $\epsilon > 0$, there exists a $\delta > 0$ so that
\[|x - a| < \delta \implies |f(x) - L| < \epsilon,\]then we can write
\[\lim_{x\to a} f(x) = L\]read, “the limit of $f$ as $x$ approaches $a$ equals $L$.”
Definition #8. [Infinite limit] Let $f(x)$ be a function on the real numbers, and $L$ be a real number. If for every $\epsilon > 0$, there exists a $x_0 > 0$ so that
\[x > x_0 \implies |f(x) - L| < \epsilon\]then we can write
\[\lim_{x\to \infty} f(x) = L.\]Looking at the definition of little o, the two cases are if $a = 0$ (a real number), or $a=\infty$. If $a=0$, then we look to defn. #5, and identify that $L=0$, and $f(x)$ (in defn. #7), is substituted as $\frac{f(x)}{g(x)}$ (in defn. $6). The logical statement from defn. #7 becomes
\[|x| < \delta \implies \left|\frac{f(x)}{g(x)}\right| < \epsilon.\]This is the same as saying:
\[|x| < \delta \implies |f(x)| < \epsilon|g(x)|\] \[|f(x)| < \epsilon|g(x)| \text{ for all } |x| < \delta.\]This matches defn. #6 up to an an ultimately inconsequential difference between $\leq$ and $<$.
For the case of $a=\infty$, we now apply the infinite limit. Again, $L=0$ and $f(x)$ (in defn. #8), is substituted as $\frac{f(x)}{g(x)})$ (in defn. #6). The infinite limit becomes:
\[x > x_0 \implies \left|\frac{f(x)}{g(x)}\right| < \epsilon\] \[x > x_0 \implies |f(x)| < \epsilon|g(x)|\] \[|f(x)| < \epsilon|g(x)| \text{ for all } x > x_0.\]This matches defn. #4, again up to the inconsequential difference between $\leq$ and $<$.
Here’s another definition of my own design. It reinterprets the relationship between $f(x)$ and $g(x)$ from multiplication into subtraction,where the difference between the two functions is accounted for by the scaled residual $r(x)$.
Definition. $f(x) = O(g(x))$ if and only if there is some $r(x)$, $\widetilde{C}$ so that
\begin{equation}\label{def eq} g(x) = f(x) + \widetilde{C}r(x), \end{equation}
and
\begin{equation}\label{def ineq} \left|r(x)\right| \leq g(x). \end{equation}
Proof. $\left(\implies\right)$ Let $f(x) = O(g(x))$ with constant $C$. Let $\widetilde C = C + 1$. Define $r(x) = \frac{1}{\widetilde C}\left(g(x) - f(x)\right)$ so that eqn. \eqref{def eq} holds,
\[\left|r(x)\right| = \left|\frac{1}{\widetilde C}\left(g(x) - f(x)\right)\right| \leq \frac{1}{\widetilde C}\left|g(x)\right| + \frac{1}{\widetilde C}\left|f(x)\right| \leq \frac{1}{\widetilde C}\left|g(x)\right| + \frac{1}{\widetilde C}C |g(x)| = \frac{1+C}{1+C} |g(x)| = |g(x)|\]$\left(\impliedby\right)$ Let $\widetilde C$, and $r(x)$ satisfy eqns. \eqref{def eq} and \eqref{def ineq}. Then,
\[\left| f(x) \right| = \left|g(x) - \widetilde C r(x)\right| \leq \left|g(x)\right| + \widetilde C \left|r(x)\right| \leq \left(\widetilde C + 1\right) g(x).\]By choosing $C=\widetilde C + 1$, we see that $f(x)=O(g(x))$.
Following the general definition of big O, the definition above doesn’t include a limiting value. Inclusion of a limiting value would only change the bounds where eqns. \eqref{def eq} and \eqref{def ineq} hold, with no change to the proof. This definition uses the scaled residual value $\widetilde C r(x)$ in eqn. \eqref{def eq}. An equivalent, unscaled definition is as follows:
Definition. $f(x) = O(g(x))$ if and only if there is some $\widetilde C$, so that with $r(x)$ being defined from
\begin{equation} g(x) = f(x) + r(x) \end{equation}
and
\[|r(x)| \leq \widetilde C g(x).\]Now, we see that the above definitions rely on the fact that $f(x) = O(g(x))$ if and only if $f(x) - g(x) = O(g(x))$.
The corresponding ‘residual’ definition for little o is even easier.
Definition. $f(x) = o(g(x))$ as $x\to a$ if and only if for $r(x)$ defined from
\begin{equation} g(x) = f(x) + r(x) \end{equation}
it holds that
\[\lim_{x\to a} \frac{r(x)}{g(x)} = 1.\]Proof.
\[\lim_{x\to a} \frac{r(x)}{g(x)} = \lim_{x\to a} \frac{g(x) - f(x)}{g(x)} = \lim_{x\to a} \left(\frac{g(x)}{g(x)} + \frac{f(x)}{g(x)} \right)= \lim_{x\to a} \left(1 + \frac{f(x)}{g(x)} \right)\] \[\lim_{x\to a} \frac{r(x)}{g(x)} - 1 = \lim_{x\to a} \frac{f(x)}{g(x)}\]So the two definitions correspond exactly. (The inclusion of the absolute value in defn. __ is no big deal.)
References
Concrete Mathematics, ch. 9.