Skip to content
DEE
Database Engineering Essentials

[DEE-101] Primary Keys and Surrogate Keys

INFO

Every table MUST have a primary key. Prefer surrogate keys (auto-increment or UUID v7) for most tables; use natural keys only when the domain value is immutable and has no privacy constraints.

Context

A primary key uniquely identifies each row in a table and serves as the default target for foreign key references. The choice between a surrogate key (an artificial identifier with no business meaning) and a natural key (derived from real-world data such as an email address or ISBN) affects write performance, storage size, join cost, and long-term schema flexibility.

The choice also interacts with the storage engine. In MySQL/InnoDB, the primary key is the clustered index -- row data is physically ordered by the PK. In PostgreSQL, the primary key creates a unique B-tree index but the heap is not clustered by default.

Principle

  • You MUST define an explicit primary key on every table.
  • You SHOULD use a surrogate key (BIGINT GENERATED ALWAYS AS IDENTITY or UUID v7) as the default choice.
  • You MAY use a natural key when: (a) the value is guaranteed immutable, (b) it has no privacy/PII sensitivity, and (c) the domain enforces uniqueness (e.g., ISO country codes, IANA language tags).
  • You MUST NOT use a mutable business attribute (e.g., email, username, phone number) as a primary key.
  • You SHOULD prefer UUID v7 over UUID v4 when UUIDs are required, especially in InnoDB-based systems.

Surrogate vs Natural Keys

CriterionSurrogate KeyNatural Key
StabilityAlways stable -- never changesOnly as stable as the business rule behind it
SizePredictable (8 bytes for BIGINT, 16 bytes for UUID)Varies (email can be 254 bytes)
Join costSmall, fixed-width comparisonsPotentially wide string comparisons
ReadabilityOpaque; requires a lookup to understand the rowMeaningful to humans in queries
Distributed generationUUID can be generated anywhere without coordinationDepends on external uniqueness guarantees
Index localitySequential (auto-inc) or time-ordered (UUID v7)Often random distribution

UUID v4 vs UUID v7

UUID v4 generates 122 bits of randomness. UUID v7 (defined in RFC 9562) embeds a 48-bit Unix timestamp in the most significant bits, followed by 74 bits of randomness. This design has significant implications for database index performance:

AspectUUID v4UUID v7
OrderingRandomTime-ordered (monotonically increasing)
Index localityPoor -- new inserts scatter across the B-treeGood -- new inserts append near the end
Page splits (InnoDB)Frequent -- up to 10x more than sequential IDs~90% fewer page splits than v4
Write throughputDegrades on large tables due to random I/ONear-sequential insert performance
Timestamp leakageNoneMillisecond-precision creation time is extractable
Cache efficiencyLow -- working set spreads across many pagesHigh -- recent inserts share pages

In MySQL/InnoDB, where the primary key IS the clustered index, random UUID v4 values force constant page reorganization. UUID v7 behaves like an auto-increment for insertion purposes while retaining the distributed-generation advantage of UUIDs.

Example

Auto-increment (PostgreSQL IDENTITY)

sql
-- Preferred for single-database systems
CREATE TABLE customers (
    customer_id  BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    email        TEXT NOT NULL UNIQUE,
    name         TEXT NOT NULL
);

UUID v7 (PostgreSQL 18+ / application-generated)

sql
-- Preferred when IDs must be generated outside the database
-- or across multiple services
CREATE TABLE orders (
    order_id    UUID PRIMARY KEY DEFAULT uuidv7(),
    customer_id BIGINT NOT NULL REFERENCES customers(customer_id),
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

For PostgreSQL versions before 18, generate UUID v7 in the application layer using libraries such as uuid (Node.js), uuid7 (Python), or java.util.UUID (Java 17+ with libraries).

Natural Key (where appropriate)

sql
-- ISO 3166-1 alpha-2 codes are immutable, short, and universally unique
CREATE TABLE countries (
    code  CHAR(2) PRIMARY KEY,  -- e.g., 'US', 'TW', 'JP'
    name  TEXT NOT NULL
);

MySQL AUTO_INCREMENT

sql
-- MySQL equivalent of IDENTITY
CREATE TABLE products (
    product_id  BIGINT AUTO_INCREMENT PRIMARY KEY,
    name        VARCHAR(255) NOT NULL,
    price       DECIMAL(10,2) NOT NULL
) ENGINE=InnoDB;

Common Mistakes

MistakeWhy It HurtsFix
Mutable natural key as PK (e.g., email, username)When the value changes, every FK reference must be updated -- cascade or breakageUse a surrogate PK; keep the natural value as a UNIQUE constraint
UUID v4 as clustered PK in InnoDBRandom inserts cause excessive page splits and write amplification on large tablesSwitch to UUID v7, or use BIGINT AUTO_INCREMENT with a separate UUID column for external exposure
Composite PK on a high-volume tableWide keys increase index size and FK reference cost in child tablesUse a surrogate BIGINT PK; enforce the composite uniqueness with a UNIQUE constraint
No primary key at allSome ORMs and replication systems require a PK; without one, deletes and updates cannot target a specific row efficientlyAlways define an explicit PK
Exposing sequential IDs in public APIsAllows enumeration attacks (scraping, IDOR)Use UUID for external-facing identifiers; keep sequential IDs internal
Using SERIAL instead of IDENTITY in PostgreSQLSERIAL creates an implicit sequence with looser ownership semantics; IDENTITY is SQL-standard and avoids accidental manual insertsUse GENERATED ALWAYS AS IDENTITY for new schemas

References