[DEE-302] Backward-Compatible Schema Changes
INFO
Schema changes MUST be backward-compatible with the currently running application version. During a rolling deployment, old and new code coexist -- a breaking schema change will crash the old instances.
Context
In continuously deployed systems, application code and database schema do not change at the same instant. During a rolling deployment, both the old version (v1) and the new version (v2) of the application are running simultaneously, both reading from and writing to the same database. If a schema migration removes a column that v1 still reads, v1 crashes. If a migration renames a column, v1 queries fail with "column not found."
This problem is fundamental to any system where the database is shared across application instances that are not all upgraded atomically. Kubernetes rolling deployments, blue-green deployments, and canary releases all create a window where multiple application versions coexist.
The expand-and-contract pattern (also called parallel change) solves this by breaking every breaking change into a sequence of backward-compatible steps. First, expand the schema by adding new structures alongside the old ones. Then, migrate application code to use the new structures while maintaining compatibility with both. Finally, contract the schema by removing the old structures once no running code references them.
Principle
- Schema changes MUST be backward-compatible with the currently running application version.
- Breaking changes MUST be decomposed into a sequence of non-breaking steps using the expand-and-contract pattern.
- A column or table MUST NOT be dropped until all application versions that reference it have been fully decommissioned.
- Developers SHOULD classify every DDL operation as safe or unsafe before writing the migration.
Visual
Key insight: Each phase is a separate deployment. At no point does old code break, because the schema always supports both the old and new application versions during the transition.
Example
Adding a Column (Safe)
Adding a nullable column with no default is safe -- old code simply ignores the new column:
-- Migration: safe, backward-compatible
ALTER TABLE users ADD COLUMN phone VARCHAR(20);Old code does not SELECT phone and does not INSERT into it, so nothing breaks. New code can begin reading and writing the column immediately.
Renaming a Column (Unsafe -- Use Expand-and-Contract)
Renaming users.name to users.full_name in a single step breaks all code referencing name:
-- UNSAFE: breaks v1 code immediately
ALTER TABLE users RENAME COLUMN name TO full_name;Expand-and-contract approach (3 deployments):
-- Deployment 1: EXPAND -- add new column, dual-write trigger
ALTER TABLE users ADD COLUMN full_name VARCHAR(255);
-- Backfill existing rows
UPDATE users SET full_name = name WHERE full_name IS NULL;
-- Create trigger to keep columns in sync
CREATE OR REPLACE FUNCTION sync_user_name() RETURNS trigger AS $$
BEGIN
IF NEW.full_name IS NULL THEN
NEW.full_name := NEW.name;
END IF;
IF NEW.name IS NULL THEN
NEW.name := NEW.full_name;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_sync_user_name
BEFORE INSERT OR UPDATE ON users
FOR EACH ROW EXECUTE FUNCTION sync_user_name();-- Deployment 2: MIGRATE -- new code reads from full_name
-- (application code change, no DDL needed)
-- Once all instances are on the new code version, proceed.-- Deployment 3: CONTRACT -- remove old column and trigger
DROP TRIGGER trg_sync_user_name ON users;
DROP FUNCTION sync_user_name();
ALTER TABLE users DROP COLUMN name;Removing a Column (Multi-Step)
Dropping users.middle_name:
-- Step 1: Deploy code that no longer reads or writes middle_name
-- (application code change only -- no migration)
-- Step 2: After all old instances are decommissioned, drop the column
ALTER TABLE users DROP COLUMN middle_name;If you drop the column before step 1, any old instance still running will crash with ERROR: column "middle_name" does not exist.
Safe vs. Unsafe Operations
| Operation | Safe? | Notes |
|---|---|---|
ADD COLUMN (nullable, no default) | Safe | Old code ignores the new column |
ADD COLUMN (with default, PG 11+) | Safe | PostgreSQL 11+ does not rewrite the table |
ADD COLUMN ... NOT NULL (no default) | Unsafe | Fails if existing rows have NULLs; blocks writes on older PostgreSQL |
DROP COLUMN | Unsafe | Breaks any code still referencing the column |
RENAME COLUMN | Unsafe | Breaks all queries using the old name |
RENAME TABLE | Unsafe | Breaks all queries using the old table name |
ALTER COLUMN TYPE (compatible cast) | Caution | VARCHAR(50) to VARCHAR(100) is safe; VARCHAR to INTEGER is not |
ALTER COLUMN TYPE (incompatible cast) | Unsafe | Requires expand-and-contract |
ADD INDEX | Safe | Use CONCURRENTLY to avoid locking (see DEE-303) |
DROP INDEX | Safe | Queries may slow down but will not break |
ADD FOREIGN KEY | Caution | Validates existing data; use NOT VALID then VALIDATE CONSTRAINT separately |
ADD CHECK CONSTRAINT | Caution | Use NOT VALID then VALIDATE CONSTRAINT in two steps |
ADD NOT NULL (to existing column) | Unsafe | Use CHECK constraint with NOT VALID, then validate |
Common Mistakes
Dropping columns before code stops using them. The most common cause of deployment failures. The old application version queries
SELECT name FROM users, the migration dropsname, and every old instance crashes. Always deploy the code change first, wait for full rollout, then drop the column in a subsequent release.Adding NOT NULL without a default.
ALTER TABLE users ADD COLUMN status VARCHAR(20) NOT NULLfails immediately if the table has any existing rows (no value to fill). Even if the table is empty now, it may not be by the time the migration runs in production. Always provide a default or add the column as nullable first.Renaming in one step.
RENAME COLUMNis never backward-compatible. Use the expand-and-contract pattern: add the new column, dual-write, migrate reads, then drop the old column.Changing column types without checking compatibility. Widening a
VARCHAR(50)toVARCHAR(100)is safe. But changingVARCHARtoINTEGER,TEXTtoJSONB, or narrowingVARCHAR(100)toVARCHAR(50)can fail on existing data or break application code that expects the old type.Assuming migrations run before new code. In some deployment strategies, new code starts before migrations complete. Design migrations to be safe regardless of execution order -- adding a nullable column is safe whether code deploys first or second.
Forgetting to clean up expanded schemas. The expand phase adds temporary columns, triggers, or views. If the contract phase is never executed, the schema accumulates dead weight. Track expand-and-contract migrations and schedule the contract step.
Related DEEs
- DEE-300 Schema Evolution Overview
- DEE-301 Migration Fundamentals -- versioned migration basics
- DEE-303 Zero-Downtime Migrations -- avoiding locks during schema changes
- DEE-304 Data Backfilling Strategies -- populating new columns after expansion
References
- Martin Fowler: Parallel Change -- the expand-and-contract pattern described as "parallel change"
- Prisma Data Guide: Expand and Contract Pattern -- practical walkthrough of expand-and-contract
- PlanetScale: Backward Compatible Database Changes -- classification of safe and unsafe operations
- GoCardless: Zero-Downtime Postgres Migrations -- The Hard Parts -- real-world production experience with backward-compatible changes
- PostgreSQL Documentation: ALTER TABLE -- official reference for DDL operations and their behavior
- Xata: pgroll Expand and Contract -- automated expand-and-contract tooling for PostgreSQL