Zoom-in: Database Index

Query is slow. AI suggests adding an index. 10× faster afterward. Commit, done.
graph LR
Q(["🔍 Query"]) -->|"WHERE email = 'x@y.com'"| T(["📋 Table"])
T -->|"result"| R(["✅ Row"])
style Q fill:#1e293b,stroke:#475569,color:#cbd5e1
style T fill:#1e293b,stroke:#475569,color:#cbd5e1
style R fill:#1e293b,stroke:#475569,color:#cbd5e1
Zoom in on the black box.
Layer 1 — No index: every query scans every row
Rows in a table are stored in insertion order — no logical ordering. Databases call this unordered storage the heap.
graph LR
Q["🔍 WHERE email\n= 'x@y.com'"] -->|"scan"| R1["Row 1\nuser_a"]
Q -->|"scan"| R2["Row 2\nuser_b"]
Q -->|"scan"| R3["Row 3\nuser_c"]
Q -->|"scan"| Rn["... Row N"]
style Q fill:#1e3a5f,stroke:#3b82f6,color:#93c5fd
style R1 fill:#1e293b,stroke:#334155,color:#94a3b8
style R2 fill:#1e293b,stroke:#334155,color:#94a3b8
style R3 fill:#1e293b,stroke:#334155,color:#94a3b8
style Rn fill:#1e293b,stroke:#334155,color:#94a3b8
To find rows matching a WHERE condition, the database reads every row from top to bottom. With 10,000 rows, that's barely noticeable. With 10 million rows, the database checks all 10 million rows for every single query.
Layer 2 — B-tree Index: O(log n) lookups
An index is a separate data structure maintained alongside the table. The most common is a B-tree.
graph TD
root["50"] --> L["25"]
root --> R["75"]
L --> LL["10"]
L --> LR["37"]
R --> RL["60"]
R --> RR["90"]
style root fill:#3b2a1a,stroke:#f59e0b,color:#fcd34d
style L fill:#1e293b,stroke:#475569,color:#cbd5e1
style R fill:#1e293b,stroke:#475569,color:#cbd5e1
style LL fill:#1e293b,stroke:#334155,color:#94a3b8
style LR fill:#1e293b,stroke:#334155,color:#94a3b8
style RL fill:#1e293b,stroke:#334155,color:#94a3b8
style RR fill:#1e293b,stroke:#334155,color:#94a3b8
Instead of scanning every row, the database navigates the tree: compare at each node, go left or right. With 10 million rows, a B-tree needs at most 23 steps to find any value (log₂ 10,000,000 ≈ 23).
Each leaf node stores the column value and a pointer back to the matching row in the heap. Find it in the B-tree → jump directly to that row.
Layer 3 — Write cost: every change has a price
Adding an index means every write must maintain two structures: the heap and the B-tree.
sequenceDiagram
participant A as Application
participant DB as Database
A->>DB: INSERT INTO users (email, ...)
DB->>DB: Write row to heap
DB->>DB: Update B-tree index
DB-->>A: OK
A->>DB: UPDATE users SET email = ...
DB->>DB: Update row in heap
DB->>DB: Remove old node + insert new node into B-tree
DB-->>A: OK
A table with 5 indexes means every INSERT updates 6 structures. An UPDATE on an indexed column must remove the old node and insert a new one in the right position — two operations instead of one.
Layer 4 — Covering Index: no heap lookup needed
Normally: find in B-tree → get pointer → read heap for remaining columns. A covering index includes every column the query needs upfront.
graph LR
Q["🔍 SELECT email, name\nWHERE email = 'x@y.com'"] --> I["🌳 Index\n(email, name)"]
I -->|"all data — no heap needed"| R["✅ Result"]
style Q fill:#1e3a5f,stroke:#3b82f6,color:#93c5fd
style I fill:#3b2a1a,stroke:#f59e0b,color:#fcd34d
style R fill:#1a3a2a,stroke:#22c55e,color:#86efac
CREATE INDEX idx_users_email_name ON users (email, name);
-- SELECT email, name WHERE email = ... never touches the heap
Databases call this an Index Only Scan — faster than a regular Index Scan because it eliminates the heap lookup entirely. This is why some queries become unexpectedly fast after adding a multi-column index.
Full picture
How a query executes depending on what indexes exist.
graph TD
Q["🔍 SELECT email, name\nWHERE email = 'x@y.com'"] --> P["Query Planner"]
P -->|"No index"| FS["Full Scan\nO(n) — reads entire heap"]
P -->|"Index on email"| IS["Index Scan\nO(log n) — B-tree\n→ pointer → heap"]
P -->|"Covering index\n(email, name)"| CS["Index Only Scan\nO(log n) — B-tree\n→ result, no heap"]
FS --> R["✅ Result"]
IS --> R
CS --> R
style Q fill:#1e3a5f,stroke:#3b82f6,color:#93c5fd
style P fill:#3b2a1a,stroke:#f59e0b,color:#fcd34d
style FS fill:#3b1a1a,stroke:#ef4444,color:#fca5a5
style IS fill:#1e293b,stroke:#475569,color:#cbd5e1
style CS fill:#1a3a2a,stroke:#22c55e,color:#86efac
style R fill:#1a3a2a,stroke:#22c55e,color:#86efac
Three common misconceptions
Indexes always make things faster
Not for small tables or queries returning most of the table. If a query fetches 80% of rows, a full scan can be faster because it avoids the B-tree-to-heap jump. The query planner knows this — it sometimes intentionally ignores an index.
Index every column to be safe
Each index is another structure to maintain. A write-heavy table with 10 indexes will show noticeably slower INSERTs. Only add an index when there's a specific query to optimize — and use EXPLAIN ANALYZE to measure before and after.
Column order in a composite index doesn't matter
Index (last_name, first_name) helps WHERE last_name = 'Nguyen' but does nothing for WHERE first_name = 'Nam' alone — the database still scans everything. Columns with higher cardinality should come first.
Takeaway
An index isn't a free speedup — it's a trade-off: faster reads, slower writes, extra storage. B-trees exist because the heap has no order; covering indexes exist because even the B-tree-to-heap jump has a cost.
When a query is slow, the right question isn't "did we add an index" but "what is the query planner doing" — EXPLAIN ANALYZE shows whether a full scan, index scan, or index only scan is happening.
Article assisted by Amy 🌸 - AI Assistant. Content reviewed by the author.
Related Posts
Zoom-in: TCP
Every HTTP request runs on TCP — but before the first byte of real data crosses the wire, three packets are exchanged carrying no data at all. TCP solves the problem the Internet doesn't.
Zoom-in: OAuth 2.0
'Sign in with Google' hides a delegation mechanism where your password never leaves Google. OAuth 2.0 solves the authorization problem without sacrificing security.
Zoom-in: Load Balancer
One domain, millions of requests per day. A load balancer doesn't just split traffic — it decides routing, health checking, and session management for the entire system.