[DEE-101] Primary Keys and Surrogate Keys
INFO
Every table MUST have a primary key. Prefer surrogate keys (auto-increment or UUID v7) for most tables; use natural keys only when the domain value is immutable and has no privacy constraints.
Context
A primary key uniquely identifies each row in a table and serves as the default target for foreign key references. The choice between a surrogate key (an artificial identifier with no business meaning) and a natural key (derived from real-world data such as an email address or ISBN) affects write performance, storage size, join cost, and long-term schema flexibility.
The choice also interacts with the storage engine. In MySQL/InnoDB, the primary key is the clustered index -- row data is physically ordered by the PK. In PostgreSQL, the primary key creates a unique B-tree index but the heap is not clustered by default.
Principle
- You MUST define an explicit primary key on every table.
- You SHOULD use a surrogate key (
BIGINT GENERATED ALWAYS AS IDENTITYor UUID v7) as the default choice. - You MAY use a natural key when: (a) the value is guaranteed immutable, (b) it has no privacy/PII sensitivity, and (c) the domain enforces uniqueness (e.g., ISO country codes, IANA language tags).
- You MUST NOT use a mutable business attribute (e.g., email, username, phone number) as a primary key.
- You SHOULD prefer UUID v7 over UUID v4 when UUIDs are required, especially in InnoDB-based systems.
Surrogate vs Natural Keys
| Criterion | Surrogate Key | Natural Key |
|---|---|---|
| Stability | Always stable -- never changes | Only as stable as the business rule behind it |
| Size | Predictable (8 bytes for BIGINT, 16 bytes for UUID) | Varies (email can be 254 bytes) |
| Join cost | Small, fixed-width comparisons | Potentially wide string comparisons |
| Readability | Opaque; requires a lookup to understand the row | Meaningful to humans in queries |
| Distributed generation | UUID can be generated anywhere without coordination | Depends on external uniqueness guarantees |
| Index locality | Sequential (auto-inc) or time-ordered (UUID v7) | Often random distribution |
UUID v4 vs UUID v7
UUID v4 generates 122 bits of randomness. UUID v7 (defined in RFC 9562) embeds a 48-bit Unix timestamp in the most significant bits, followed by 74 bits of randomness. This design has significant implications for database index performance:
| Aspect | UUID v4 | UUID v7 |
|---|---|---|
| Ordering | Random | Time-ordered (monotonically increasing) |
| Index locality | Poor -- new inserts scatter across the B-tree | Good -- new inserts append near the end |
| Page splits (InnoDB) | Frequent -- up to 10x more than sequential IDs | ~90% fewer page splits than v4 |
| Write throughput | Degrades on large tables due to random I/O | Near-sequential insert performance |
| Timestamp leakage | None | Millisecond-precision creation time is extractable |
| Cache efficiency | Low -- working set spreads across many pages | High -- recent inserts share pages |
In MySQL/InnoDB, where the primary key IS the clustered index, random UUID v4 values force constant page reorganization. UUID v7 behaves like an auto-increment for insertion purposes while retaining the distributed-generation advantage of UUIDs.
Example
Auto-increment (PostgreSQL IDENTITY)
-- Preferred for single-database systems
CREATE TABLE customers (
customer_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
email TEXT NOT NULL UNIQUE,
name TEXT NOT NULL
);UUID v7 (PostgreSQL 18+ / application-generated)
-- Preferred when IDs must be generated outside the database
-- or across multiple services
CREATE TABLE orders (
order_id UUID PRIMARY KEY DEFAULT uuidv7(),
customer_id BIGINT NOT NULL REFERENCES customers(customer_id),
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);For PostgreSQL versions before 18, generate UUID v7 in the application layer using libraries such as uuid (Node.js), uuid7 (Python), or java.util.UUID (Java 17+ with libraries).
Natural Key (where appropriate)
-- ISO 3166-1 alpha-2 codes are immutable, short, and universally unique
CREATE TABLE countries (
code CHAR(2) PRIMARY KEY, -- e.g., 'US', 'TW', 'JP'
name TEXT NOT NULL
);MySQL AUTO_INCREMENT
-- MySQL equivalent of IDENTITY
CREATE TABLE products (
product_id BIGINT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
price DECIMAL(10,2) NOT NULL
) ENGINE=InnoDB;Common Mistakes
| Mistake | Why It Hurts | Fix |
|---|---|---|
| Mutable natural key as PK (e.g., email, username) | When the value changes, every FK reference must be updated -- cascade or breakage | Use a surrogate PK; keep the natural value as a UNIQUE constraint |
| UUID v4 as clustered PK in InnoDB | Random inserts cause excessive page splits and write amplification on large tables | Switch to UUID v7, or use BIGINT AUTO_INCREMENT with a separate UUID column for external exposure |
| Composite PK on a high-volume table | Wide keys increase index size and FK reference cost in child tables | Use a surrogate BIGINT PK; enforce the composite uniqueness with a UNIQUE constraint |
| No primary key at all | Some ORMs and replication systems require a PK; without one, deletes and updates cannot target a specific row efficiently | Always define an explicit PK |
| Exposing sequential IDs in public APIs | Allows enumeration attacks (scraping, IDOR) | Use UUID for external-facing identifiers; keep sequential IDs internal |
| Using SERIAL instead of IDENTITY in PostgreSQL | SERIAL creates an implicit sequence with looser ownership semantics; IDENTITY is SQL-standard and avoids accidental manual inserts | Use GENERATED ALWAYS AS IDENTITY for new schemas |
Related DEEs
- DEE-100 Normalization
- DEE-102 Foreign Keys and Referential Integrity
- DEE-103 Relationships (1:1, 1:N, M:N)
References
- RFC 9562 -- Universally Unique IDentifiers (UUIDs) -- defines UUID v7 with time-ordered layout
- PostgreSQL Documentation: CREATE TABLE (IDENTITY columns) -- GENERATED ALWAYS AS IDENTITY syntax
- MySQL Documentation: AUTO_INCREMENT -- auto-increment behavior in InnoDB
- UUID, Serial, or Identity for PostgreSQL Primary Keys -- Cybertec -- practical comparison with benchmarks
- UUID v7 in PostgreSQL 18 -- Better Stack -- native uuidv7() support
- Choosing a Postgres Primary Key -- Supabase -- decision framework for PK strategy