Database Indexes Explained Simply
Database indexes confuse people because the explanations are often too abstract. “It’s like the index in a book” doesn’t help if you don’t understand what problem you’re solving.
Here’s the practical version: indexes make reading faster and writing slower. That’s the entire trade-off.
The Problem
Without an index, finding a specific row requires scanning the entire table. If you have a million users and want to find one by email, the database checks every single row until it finds a match.
This is called a full table scan. It’s fine for small tables. For anything beyond a few thousand rows, it becomes a problem.
How Indexes Work
An index creates a separate data structure that stores column values and pointers to their rows. It’s sorted, so the database can use binary search instead of scanning everything.
When you create an index on an email column, the database builds a sorted list of all email values with references to where those rows actually live. Now finding a specific email is fast: binary search takes log(n) time instead of linear.
The index itself takes up disk space and memory. It’s redundant data, stored purely to speed up looking things up.
Primary Keys
Primary keys are automatically indexed. That’s why looking up rows by ID is always fast. The database creates a clustered index on the primary key, which means the table itself is sorted by that column.
This is why incrementing integer IDs are common. They work well with clustered indexes because new rows get appended to the end rather than inserted in the middle.
UUIDs are less efficient as primary keys because they’re random. Each insert might require reordering the index tree, causing more disk writes.
When to Add Indexes
Add indexes for columns in WHERE clauses. If you frequently query WHERE email = ?, index email. If you never filter by a column, don’t index it.
Add indexes for JOIN conditions. If you join tables on foreign keys, those foreign keys should be indexed. Without indexes, joins become full table scans, which is disastrous.
Add indexes for ORDER BY clauses. Sorting a million rows is expensive. If the database can read data in sorted order directly from an index, it skips the sorting step.
Add indexes for columns used in GROUP BY. Same logic: reading pre-sorted data is faster than sorting during query execution.
When Not to Add Indexes
Don’t index columns with low cardinality. A boolean column has two possible values. An index doesn’t help much because roughly half the table matches any query. The database might scan the table anyway because it’s faster than reading the index.
Don’t index small tables. If your entire table fits in memory and has a few thousand rows, indexes add overhead without meaningful benefits. Full scans are fast enough.
Don’t index columns that are frequently updated. Every update requires updating the index. If a column changes often, the index becomes expensive to maintain.
Don’t add indexes speculatively. Indexes cost disk space and slow down writes. Only create them when you have evidence they’ll improve query performance.
Composite Indexes
You can index multiple columns together. This helps when queries filter on multiple columns.
If you often query WHERE country = ? AND city = ?, a composite index on (country, city) works better than separate indexes. The database can use both conditions in a single index lookup.
Order matters. An index on (country, city) helps queries filtering by country alone, or both country and city. It doesn’t help queries filtering only by city.
Think of it like a phone book: sorted by last name, then first name. You can find all Smiths quickly. You can find John Smith quickly. You can’t find all Johns quickly because they’re scattered throughout the book.
Covering Indexes
A covering index includes all columns needed by a query. The database can satisfy the entire query from the index without accessing the table.
If you frequently run SELECT name, email FROM users WHERE email = ?, an index on (email, name) lets the database read directly from the index. This is faster because it touches fewer disk pages.
Covering indexes are a space-time trade-off. They store more data in the index, using more disk space, but eliminate lookups to the main table.
Index Types
B-tree indexes are the default. They work well for equality and range queries. They’re what you get unless you specify otherwise.
Hash indexes work for equality but not ranges. They’re faster for exact matches but can’t handle WHERE age > 25. Most databases default to B-trees because they’re more versatile.
Full-text indexes handle text search. Regular indexes don’t help with WHERE description LIKE '%keyword%' because they can’t use index ordering. Full-text indexes tokenize text and build a searchable structure.
Partial indexes only include rows matching a condition. If you frequently query WHERE status = 'active' and most rows are inactive, a partial index on active rows is smaller and faster.
The Write Cost
Indexes aren’t free. Every insert requires updating all indexes on that table. Every update to an indexed column requires updating the index. Every delete requires removing entries from indexes.
For read-heavy workloads, this cost is worth it. For write-heavy workloads, too many indexes slow everything down.
I’ve seen tables with 15+ indexes that took seconds to insert single rows. The application spent more time maintaining indexes than doing actual work. We dropped unnecessary indexes and inserts became 10x faster.
Query Planning
Modern databases use query planners that decide whether to use indexes. Sometimes the planner chooses a full scan even when indexes exist, because it estimates that’s faster given the data distribution.
You can view execution plans to see what the database actually does. In PostgreSQL it’s EXPLAIN ANALYZE. In MySQL it’s EXPLAIN. This shows whether indexes are being used and where time is spent.
Sometimes queries don’t use indexes because the column types don’t match. If you query WHERE id = '123' but id is an integer, the database might cast the column, preventing index use. Keep types consistent.
Practical Guidelines
Start without indexes except on primary keys and foreign keys. Measure actual query performance. Add indexes when you identify slow queries.
Use query logs to find slow queries. Don’t guess which columns need indexing. Let real usage patterns guide decisions.
Monitor index usage. PostgreSQL has pg_stat_user_indexes showing which indexes are actually used. Drop unused indexes, they just waste space and slow writes.
For analytical queries on large datasets, consider separate read replicas with different indexing strategies. Don’t optimize the primary database for reports at the cost of transaction performance.
Common Mistakes
Adding indexes without measuring impact. Indexes have costs. Verify they actually improve performance before keeping them.
Creating indexes on every column. More indexes don’t mean better performance. Each one costs maintenance overhead.
Ignoring index size. A multi-column index on high-cardinality columns can be larger than the table itself. That’s sometimes worth it, but you should know it’s happening.
Not updating statistics. Databases use statistics about data distribution to plan queries. Stale statistics lead to poor plans. Run ANALYZE regularly on tables that change frequently.
When Things Get Complex
This covers the basics. Real databases have more nuance: index-only scans, index skip scans, bitmap indexes, clustered vs non-clustered indexes, fill factors, and more.
For complex scenarios, specialists in this space can help design appropriate indexing strategies. But the fundamentals remain: indexes trade write performance for read performance, and you should only create them when measurements justify the trade-off.
Understanding indexes doesn’t require advanced theory. It requires knowing what problem you’re solving and measuring whether the solution works. Start simple, measure real workloads, and optimize based on evidence rather than speculation.