[DEE-101] Primary Keys and Surrogate Keys

INFO

Every table MUST have a primary key. Prefer surrogate keys (auto-increment or UUID v7) for most tables; use natural keys only when the domain value is immutable and has no privacy constraints.

Context

A primary key uniquely identifies each row in a table and serves as the default target for foreign key references. The choice between a surrogate key (an artificial identifier with no business meaning) and a natural key (derived from real-world data such as an email address or ISBN) affects write performance, storage size, join cost, and long-term schema flexibility.

The choice also interacts with the storage engine. In MySQL/InnoDB, the primary key is the clustered index -- row data is physically ordered by the PK. In PostgreSQL, the primary key creates a unique B-tree index but the heap is not clustered by default.

Principle

You MUST define an explicit primary key on every table.
You SHOULD use a surrogate key (BIGINT GENERATED ALWAYS AS IDENTITY or UUID v7) as the default choice.
You MAY use a natural key when: (a) the value is guaranteed immutable, (b) it has no privacy/PII sensitivity, and (c) the domain enforces uniqueness (e.g., ISO country codes, IANA language tags).
You MUST NOT use a mutable business attribute (e.g., email, username, phone number) as a primary key.
You SHOULD prefer UUID v7 over UUID v4 when UUIDs are required, especially in InnoDB-based systems.

Surrogate vs Natural Keys

Criterion	Surrogate Key	Natural Key
Stability	Always stable -- never changes	Only as stable as the business rule behind it
Size	Predictable (8 bytes for BIGINT, 16 bytes for UUID)	Varies (email can be 254 bytes)
Join cost	Small, fixed-width comparisons	Potentially wide string comparisons
Readability	Opaque; requires a lookup to understand the row	Meaningful to humans in queries
Distributed generation	UUID can be generated anywhere without coordination	Depends on external uniqueness guarantees
Index locality	Sequential (auto-inc) or time-ordered (UUID v7)	Often random distribution

UUID v4 vs UUID v7

UUID v4 generates 122 bits of randomness. UUID v7 (defined in RFC 9562) embeds a 48-bit Unix timestamp in the most significant bits, followed by 74 bits of randomness. This design has significant implications for database index performance:

Aspect	UUID v4	UUID v7
Ordering	Random	Time-ordered (monotonically increasing)
Index locality	Poor -- new inserts scatter across the B-tree	Good -- new inserts append near the end
Page splits (InnoDB)	Frequent -- up to 10x more than sequential IDs	~90% fewer page splits than v4
Write throughput	Degrades on large tables due to random I/O	Near-sequential insert performance
Timestamp leakage	None	Millisecond-precision creation time is extractable
Cache efficiency	Low -- working set spreads across many pages	High -- recent inserts share pages

In MySQL/InnoDB, where the primary key IS the clustered index, random UUID v4 values force constant page reorganization. UUID v7 behaves like an auto-increment for insertion purposes while retaining the distributed-generation advantage of UUIDs.

Example

Auto-increment (PostgreSQL IDENTITY)

sql

-- Preferred for single-database systems
CREATE TABLE customers (
    customer_id  BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    email        TEXT NOT NULL UNIQUE,
    name         TEXT NOT NULL
);

UUID v7 (PostgreSQL 18+ / application-generated)

sql

-- Preferred when IDs must be generated outside the database
-- or across multiple services
CREATE TABLE orders (
    order_id    UUID PRIMARY KEY DEFAULT uuidv7(),
    customer_id BIGINT NOT NULL REFERENCES customers(customer_id),
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

For PostgreSQL versions before 18, generate UUID v7 in the application layer using libraries such as uuid (Node.js), uuid7 (Python), or java.util.UUID (Java 17+ with libraries).

Natural Key (where appropriate)

sql

-- ISO 3166-1 alpha-2 codes are immutable, short, and universally unique
CREATE TABLE countries (
    code  CHAR(2) PRIMARY KEY,  -- e.g., 'US', 'TW', 'JP'
    name  TEXT NOT NULL
);

MySQL AUTO_INCREMENT

sql

-- MySQL equivalent of IDENTITY
CREATE TABLE products (
    product_id  BIGINT AUTO_INCREMENT PRIMARY KEY,
    name        VARCHAR(255) NOT NULL,
    price       DECIMAL(10,2) NOT NULL
) ENGINE=InnoDB;

Common Mistakes

Mistake	Why It Hurts	Fix
Mutable natural key as PK (e.g., email, username)	When the value changes, every FK reference must be updated -- cascade or breakage	Use a surrogate PK; keep the natural value as a UNIQUE constraint
UUID v4 as clustered PK in InnoDB	Random inserts cause excessive page splits and write amplification on large tables	Switch to UUID v7, or use BIGINT AUTO_INCREMENT with a separate UUID column for external exposure
Composite PK on a high-volume table	Wide keys increase index size and FK reference cost in child tables	Use a surrogate BIGINT PK; enforce the composite uniqueness with a UNIQUE constraint
No primary key at all	Some ORMs and replication systems require a PK; without one, deletes and updates cannot target a specific row efficiently	Always define an explicit PK
Exposing sequential IDs in public APIs	Allows enumeration attacks (scraping, IDOR)	Use UUID for external-facing identifiers; keep sequential IDs internal
Using SERIAL instead of IDENTITY in PostgreSQL	SERIAL creates an implicit sequence with looser ownership semantics; IDENTITY is SQL-standard and avoids accidental manual inserts	Use `GENERATED ALWAYS AS IDENTITY` for new schemas

DEE-100 Normalization
DEE-102 Foreign Keys and Referential Integrity
DEE-103 Relationships (1:1, 1:N, M:N)

References

RFC 9562 -- Universally Unique IDentifiers (UUIDs) -- defines UUID v7 with time-ordered layout
PostgreSQL Documentation: CREATE TABLE (IDENTITY columns) -- GENERATED ALWAYS AS IDENTITY syntax
MySQL Documentation: AUTO_INCREMENT -- auto-increment behavior in InnoDB
UUID, Serial, or Identity for PostgreSQL Primary Keys -- Cybertec -- practical comparison with benchmarks
UUID v7 in PostgreSQL 18 -- Better Stack -- native uuidv7() support
Choosing a Postgres Primary Key -- Supabase -- decision framework for PK strategy

[DEE-101] Primary Keys and Surrogate Keys ​

Context ​

Principle ​

Surrogate vs Natural Keys ​

UUID v4 vs UUID v7 ​

Example ​

Auto-increment (PostgreSQL IDENTITY) ​

UUID v7 (PostgreSQL 18+ / application-generated) ​

Natural Key (where appropriate) ​

MySQL AUTO_INCREMENT ​

Common Mistakes ​

Related DEEs ​

References ​