notes to self

Big O notation · part 6 of 8

The data structures to know by heart

These are the data structures worth memorising. Know them and you can answer most complexity questions on sight. It's also why "just use a Set" or "just use a Map" is so often the right call.

Structure Find by key/index Insert Search for a value Good for
Array (by index) O(1) O(1) at the end O(n) Order matters, you index by position
Hash map / Set O(1) average O(1) average O(1) average "Have I seen this?", lookups by key
Balanced tree O(log n) O(log n) O(log n) Sorted order and fast lookup
Heap O(1) peek O(log n) O(n) Always grabbing the smallest or largest
Linked list O(n) O(1) at a known spot O(n) Lots of inserts and removals mid-list

Notice the hash map says "average", not "always". The worst case is O(n).

Amortised analysis is why we can write O(1) for those even though the odd operation isn't. Adding to a dynamic array (JavaScript's Array, most List types) is usually instant: it just drops the item into spare room it keeps on hand. When that room runs out, the next add has to grab a bigger chunk of memory and copy everything over, which is O(n). But each time it grows it doubles the room, so those slow copies happen rarely. Average the cost over lots of adds and it comes out to O(1) each, and that average is what amortised means. Hash maps are the same story: O(1) almost always, but a bad run of collisions (or a resize) can push one operation to O(n).

At your desk versus in the interview

The interview wants the big O answer: the shape (say, O(n log n)), the worst case (a hash map's O(n), not its usual O(1)), and both time and space (O(n) time, O(1) space).

At your desk it's a little different. The shape still matters most for the thing that actually grows, but constants and reality climb back through the window. An O(n²) loop over a list that is always about five items long is completely fine. Rewrite it as an O(n) version backed by a hash map, and if it runs a thousand times a second it can come out slower, not faster: building a fresh map each call costs more than the few comparisons it saves.

Knowing not to optimise the shape that isn't your bottleneck is harder to learn than the notation itself. Measure first, then put your effort where the input is large.

Is it worth learning?

Yes. The basic ideas are simple: nested loops are trouble, and a lookup beats a scan. Big O gives them a name and a notation, so we can talk about them and catch the problem before the data gets too big.

Learn the shapes by sight, learn the three-step recipe, and keep the data-structure table in your back pocket. With those you can work out the rest from first principles, which is a far better place to be than having memorised a list.

Now go and find the worst loop in your codebase.