Notes on time complexity

June 18, 2023

Imagine we’re given a task to sort a list of names, actually a billion names, for a search engine. There are dozens of ways to sort. Each one of them solves the problem, but some finish in seconds, others in hours. Algorithm analysis helps us to compare them.

We mainly care about running time. We also consider memory, but time is the star. Why? Even if you have enough memory, an algorithm that takes too long can make a program useless.

But it runs fast on my machine!

When we write any algorithm, we want it to be fast, reliable, and most importantly, ready for the real world. But how do we know if it’s fast enough compared to another algorithm doing the same task?

You could run both algorithms on your laptop. But it’s not universal. A very fast algorithm on a high-end computer or server might limp on a budget PC. And what about production, where the hardware could be anything from a cloud cluster to an old desktop?

Execution time depends on hardware, programming language, and even your coding style. Counting the number of statements or instructions doesn’t help either, because a “single step” in Python is not the same as in C++ or Java. To pick the best algorithm, we need a universal yardstick. It should be tied to the algorithm itself, not the machine it runs on or the programming language we are using.

Let’s think about input size: nn. It’s the number of elements we’re working with: the length of an array, the nodes in a graph, the bits in a number. Bigger nn means more work — more numbers to sort, more data to process. If you express running time as a function of nn, say f(n)f(n), you get a machine-independent measure. Now you can compare algorithms head-to-head, no stopwatch needed.

Good! But what is in f(n)f(n)?

f(n)f(n) is a function of input nn. It counts operations — the basic steps like adding numbers, comparing values, or reading memory. Suppose you’re summing a list of nn numbers. You loop through each, adding one at a time. That’s roughly nn operations, so f(n)=nf(n) = n.

Now imagine sorting that list with Bubble sort. You compare pairs of numbers, possibly in nested loops, checking each item against others. That could lead to n×nn \times n comparisons, plus some other setup steps, giving f(n)=n2+4n+100f(n) = n^2 + 4n + 100.

Those terms in f(n)f(n) — like n2n^2, 4n4n, or 100 — reflect the algorithm’s moves. The n2n^2 comes from nested loops, the heavy lifting. The 4n4n might be a single loop for swaps or checks. The 100 is fixed work, like setting up variables.

We count these operations on an idealized machine, where each step takes the same time. This way, we are ignoring the real-world quirks like slow CPUs or cache delays. It’s a clean way to focus on the algorithm’s logic.

Inputs are not always same

Not all inputs behave the same with any algorithm. Some inputs take a long time, while the algorithm finishes instantly with some other inputs.

For example, A sorted list might let a sorting algorithm finish early, while a reverse-sorted list needs every step: more time to finish. That’s why analyze every type of input:

  • Best Case: The luckiest input, where the algorithm runs fastest. For linear search, finding the item first takes 1 step—blazing fast.
  • Worst Case: The toughest input, where the algorithm limps. In linear search, the item is last or missing, requiring nn steps.
  • Average Case: The typical scenario, assuming random inputs. For linear search, you check about half the list, so roughly n/2n/2 steps. From these, the Worst case is our safety net — it guarantees performance even under attack or the worst possible scenario. The average case predicts real-world use but needs assumptions about input patterns. The best case is nice but rare.

Each case gives its own f(n)f(n). For example, in the worst case, an algorithm’s f(n)f(n) might be n2+nn^2 + n, while for the best case, it might be nn. You see, we need a better way to describe these cleanly.

Making f(n)f(n) more simple

Most of the time, for any algorithm, we want to know what happens when our input size is huge, like really huge! And due to this assumption of huge input size, not all parts of f(n)f(n) matter equally in running time measurement. For huge nn, the biggest term takes over.

In f(n)=n2+4n+100f(n) = n^2 + 4n + 100, when n=1,000,000n = 1,000,000, the n2n^2 is a trillion, while 4n4n is just 4 million, and 100 is a particle. So the quadratic term n2n^2 dominates the others.

This thinking is about the “rate of growth”. It tells us how running time scales with input. This is what computer scientists refer to as the algorithm’s time complexity. Initially, we consider all terms. We include smaller contributions to understand the exact number of steps. Later, for large inputs, we focus on the dominant term. This tells us how the algorithm behaves as it grows.

It’s like budgeting: if you’re buying a car and a coffee, the car’s cost overshadows the rest. We focus on n2n^2 because it tells the algorithm’s fate for larger inputs. So,

f(n)=n2+4n+100n2(approximately equal,when n)f(n) = n^2 + 4n + 100 \approx n^2 \\(\text{approximately equal}, \text{when n} \rightarrow \infty)

Asymptotic notation

Previously, we had f(n)f(n) with a lot of unnecessary terms instead of just the dominant term. We’re going to change it here using asymptotic notation and the thinking process of rate of growth to make algorithm analysis simpler. So, what is “Asymptotic” here?

A function f(n)f(n) is asymptotic to another function g(n)g(n) if:

limnf(n)g(n)=1\lim_{n \rightarrow \infty} \frac{f(n)}{g(n)} = 1

This means:

  • For very large nn, f(n)f(n) and g(n)g(n) grow at almost the same rate.
  • g(n)g(n) is like a “simpler approximation” of f(n)f(n) at infinity. Let’s say we’ve f(n)=n2+300f(n) = \sqrt{n^2 + 300}. Here for a very large nn, 300300 barely matters. So we can say n2+1n  as  n\sqrt{n^2 + 1} \approx n \ \ as\ \ n \rightarrow \infty(n grows without bounds) .

We’re going to do the same here: we’ll focus on the behavior of f(n)f(n) when nn approaches infinity. We’ll ignore minor terms of f(n)f(n) and zoom in on the dominant-term trend.

So our task is to find some simpler g(n)g(n) for any f(n)f(n) to analyze the running time of an algorithm asymptotically, instead of playing with all terms.

Now see, there is some flexibility and options here. We could pick a g(n)g(n) that grows slightly faster than f(n)f(n), or one that slightly slower. For example, if f(n)=n2+nf(n) = n^2+n, a natural choice is g(n)=n2g(n) = n^2, which closely tracks the growth of f(n)f(n). But we could also pick g(n)=n3g(n) = n^3; it will eventually be larger than f(n)f(n), giving an upper estimate, or g(n)=ng(n) = n, which underestimates it slightly. These concepts of upper and lower estimation bounds could be formalized through Big-OO, Big-Ω\Omega, and Big-Θ\Theta.

Big-O: analyzing the upper bound

Big-O sets a ceiling on growth. It describes an upper bound, meaning the function will never grow faster than this, for any input of size nn. If f(n)=O(g(n))f(n) = O(g(n)), then f(n)f(n) grows no faster than g(n)g(n), up to a constant. Formally, there are constants c>0c > 0 and n0>0n_0 > 0 so that

f(n)cg(n) for all nn0.f(n) \leq c \cdot g(n)\ \text{for all } n \geq n_0.

Notice something important: there are infinitely many valid choices for g(n)g(n). For example:

  • If f(n)=nf(n) = n, then technically f(n)=O(n)f(n) = O(n), but also f(n)=O(n2)f(n) = O(n^2), O(n3)O(n^3), or even O(2n)O(2^n).
  • All of these are valid upper bounds because they grow at least as fast as f(n)f(n). So Big-O is really a set of functions that provide an upper estimate. In practice, we always choose the “smallest” or tightest one, which gives the clearest description of growth.

For example, if f(n)=n2+4n+100f(n) = n^2 + 4n + 100, then the tightest choice is f(n)=O(n2)f(n) = O(n^2). Because n2n^2 dominates and we ignore the smaller terms. Big-O is about safety: “It won’t get worse than this.” That’s why linear search is O(n)O(n), binary search is O(logn)O(\log n).

Big-Ω\Omega: analyzing the lower bound

Big-Ω\Omega sets a floor. If f(n)=Ω(g(n))f(n) = \Omega(g(n)), then f(n)f(n) grows at least as fast as g(n)g(n). Formally, there are constants c>0c > 0 and n0>0n_0 > 0 so that

cg(n)f(n)for all nn0.c \cdot g(n) \le f(n) \quad \text{for all } n \ge n_0.

Similar to Big-O, notice something important: there are infinitely many valid choices for g(n)g(n). For example:

  • If f(n)=n2f(n) = n^2, then technically f(n)=Ω(n2)f(n) = \Omega(n^2), but also f(n)=Ω(n)f(n) = \Omega(n), Ω(logn)\Omega(\log n), or even Ω(1)\Omega(1).
  • All of these are valid lower bounds because they grow no faster than f(n)f(n). So Big-Ω\Omega is really a set of functions that provide a lower estimate. In practice, we usually choose the “largest” or tightest one, which gives the clearest description of growth. Hence, we seek the largest g(n)g(n) that stays below f(n)f(n).

For example, if f(n)=100n2+10n+50f(n) = 100n^2 + 10n + 50, then f(n)=Ω(n2)f(n) = \Omega(n^2). Because for large nn, n2n^2 gives a lower bound on the growth of f(n)f(n).

Big-Ω\Omega says, “It won’t be faster than this.” It’s less commonly discussed than Big-O, but it helps confirm an algorithm’s minimum cost.

Big-Θ\Theta: analyzing both bounds

So far:

  • Big-O gave us a ceiling → “it won’t grow faster than this.”
  • Big-Ω\Omega gave us a floor → “it won’t grow slower than this.” But what if we want the exact growth rate, not just loose boundaries? For example, if f(n)=nf(n) = n, then technically f(n)=O(n2)f(n) = O(n^2) and f(n)=Ω(1)f(n) = \Omega(1). Those are true, but not very informative. What we’d really like is a description that defines the growth rate from both sides at once.

That’s where Big-Θ\Theta (Theta) comes in. It is not too high, not too low, but just right. Formally, f(n)=Θ(g(n))f(n) = \Theta(g(n)) if there exist positive constants c1,c2c_1, c_2, and n0n_0 such that for all nn0n \ge n_0:

c1g(n)f(n)c2g(n)c_1 \cdot g(n) \leq f(n) \leq c_2 \cdot g(n)

This means f(n)f(n) is squeezed between two multiples of g(n)g(n) when nn gets large. So it doesn’t just grow at most like g(n)g(n) or at least like g(n)g(n) — it grows exactly like g(n)g(n) (up to constant factors). Example:

f(n)=100n2+10n+50f(n) = 100n^2 + 10n + 50
  • Upper bound: O(n2)O(n^2)
  • Lower bound: Ω(n2)\Omega(n^2)
  • Together: Θ(n2)\Theta(n^2)

Everything together

Now let’s see how best, worst, and average cases relate to Big-OO, Big-Ω\Omega, and Big-Θ\Theta for Linear Search.

Worst Case (f(n)nf(n) \approx n):

  • Big-OO: O(n)O(n) → the search will take at most linear time.
  • Big-Ω\Omega: Ω(n)Ω(n) → the search takes at least linear time in the worst scenario.
  • Big-Θ\Theta: Θ(n)Θ(n) → exactly linear, up to constants.

Best Case (f(n)1f(n) \approx 1):

  • Big-OO: O(1)O(1) → the search will take at most constant time(tightest bound). Technically also O(n)O(n), O(n2)O(n^2) … but these are looser.
  • Big-Ω\Omega: Ω(1)Ω(1) → at least constant time.
  • Big-Θ\Theta: Θ(1)Θ(1) → exactly constant.

Average Case (f(n)n/2f(n) \approx n/2):

  • Big-OO: O(n)O(n) → at most linear.
  • Big-Ω\Omega: Ω(n)Ω(n) → at least linear.
  • Big-Θ\Theta: Θ(n)Θ(n) → exactly linear on average.

Does O(n)O(n) mean “worst case time complexity”?

Not necessarily! Saying an algorithm’s time complexity is O(n)O(n) just means its running time grows no faster than linear, but it doesn’t automatically mean worst-case. We need to specify the case. Let’s take linear Search as an example:

  • Worst Case: The item is last or missing. You check all nn elements, so fworst(n)=nf_\text{worst}(n) = n.
    • Tight Big-O: fworst(n)=O(n)f_\text{worst}(n) = O(n).
  • Best Case: The item is first. You check only one element, so fbest(n)=1f_\text{best}(n) = 1.
    • Tight Big-O: fbest(n)=O(1)f_\text{best}(n) = O(1).
    • Technically, Big-O is flexible: fbest(n)=O(n)f_\text{best}(n) = O(n) or even O(n2)O(n^2), but those are looser, less informative bounds.

So next time you see “O(n)O(n)”, ask: best, worst, or average case? Big-O, Big-Ω\Omega, or Big-Θ\Theta alone don’t tell the whole story.

So…

  • Big-OO provides an upper bound — “it won’t grow faster than this.”
  • Big-Ω\Omega provides a lower bound — “it won’t grow slower than this.”
  • Big-Θ\Theta gives a tight bound — the function grows exactly at this rate (up to constant factors).
  • In practice, we most often discuss worst-case performance using Big-OO notation, because knowing the maximum time an algorithm could take is critical for guaranteeing performance.
  • Remember: case analysis (best/worst/average) and asymptotic notation(O/Ω/ΘO/\Omega/\Theta) are independent concepts. You can describe any case with any notation, though by convention we typically focus on worst-case Big-OO.

Have a great day!