[BEE-12004] Bulkhead Pattern

INFO

Partition resources by dependency so that a failure or slowdown in one partition cannot exhaust resources needed by others.

Context

Most service calls share a single, bounded resource: a thread pool, a connection pool, or a semaphore. When a dependency becomes slow — not broken, just slow — threads pile up waiting for responses. If every request to your service touches that slow dependency, the shared pool drains. New requests queue. The queue grows. Eventually every inbound request is stuck waiting — including requests that have nothing to do with the slow dependency.

This is resource exhaustion cascading failure. It is subtle and common. A circuit breaker (BEE-12001) trips when a dependency fails; it does not protect you from a dependency that is merely slow. Timeouts (BEE-12002) bound how long a single call waits, but if 500 threads are all waiting up to 5 seconds simultaneously, you still exhaust the pool before any timeout fires.

The Bulkhead pattern, described by Michael Nygard in Release It! (Pragmatic Programmers, 2018) and documented in the Microsoft Azure Architecture Center (learn.microsoft.com/en-us/azure/architecture/patterns/bulkhead), addresses this by allocating separate resource partitions — bulkheads — to different dependencies or caller classes. A failure in one bulkhead cannot consume resources in another.

The name comes from ship design. A ship's hull is divided into watertight compartments by internal walls called bulkheads. If the hull is breached, only the flooded compartment fills with water. The ship remains afloat. Without bulkheads, any breach sinks the entire vessel.

Principle

Assign dedicated, bounded resource partitions to each downstream dependency. Size partitions according to expected load and criticality. A slow or failing dependency can exhaust only its own partition; all other partitions remain available.

Isolation Mechanisms

There are four primary ways to implement bulkhead isolation, ranging from coarse to fine.

Thread Pool Isolation

Each dependency gets its own fixed-size thread pool. Calls to Dependency A run on Pool A; calls to Dependency B run on Pool B. Pool A exhaustion does not affect Pool B.

This is the most powerful isolation mechanism. Even if Dependency A is completely hung and all threads in Pool A are blocked, Pool B is unaffected. The main service thread returns quickly because it delegates work to the dependency-specific pool and either awaits the future with a short timeout or uses reactive patterns.

Trade-off: Each thread pool has overhead (memory per thread, context switch cost). Fine-grained partitioning across dozens of dependencies becomes expensive.

Use when: The dependency is slow or blocking (database, external HTTP API, legacy service). This is the default recommendation for I/O-bound calls.

Semaphore Isolation

A semaphore limits the number of concurrent calls to a dependency. Callers run on their own threads but must acquire a permit before calling the dependency. When all permits are held, new callers are rejected immediately (or after a brief wait).

Semaphore isolation is lightweight — no thread pool management — but it does not protect against the calling thread blocking. If the dependency is slow and the caller thread blocks waiting for a response, the caller's thread is still consumed. Semaphore isolation prevents unbounded concurrency; it does not prevent thread exhaustion on the caller's side.

Use when: The call is fast (sub-millisecond cache lookups, in-process calls) or when using a fully non-blocking async model where threads are not consumed while waiting.

Connection Pool Isolation

Each dependency gets its own connection pool. The database for critical user authentication gets 20 connections; the analytics write path gets 5. When analytics is slow, it can hold at most 5 connections and no more.

This is often already provided by most connection pool libraries and should be configured deliberately per dependency rather than sharing a single pool.

Process-Level Isolation

In a microservices architecture, separating non-critical workloads into distinct services (separate processes, containers, or VMs) provides the strongest isolation boundary. A crashed recommendation service cannot affect a payment service because they share no in-process resources at all.

This is appropriate for workloads with substantially different scaling, availability, or failure-tolerance requirements — but it is a coarser and more expensive mechanism than in-process bulkheads.

The Problem Without Bulkheads

Analytics hangs. Threads accumulate waiting for it. The shared pool drains. Payments requests — which would complete in 5ms — find no threads available and fail. A non-critical workload has killed a critical one.

The Solution With Bulkheads

Analytics still hangs, but it can only exhaust its own 10-thread pool. New analytics calls are rejected with an immediate error — they do not wait. The payments pool is unaffected; payments requests succeed normally.

Worked Example

Scenario: An API gateway calls three backend services per request.

Service	Criticality	Expected behavior
Payments	Critical	Must succeed for the primary user action
Recommendations	Non-critical	Enhances the response; degradable
Analytics	Fire-and-forget	Logging only; safe to drop

Without bulkheads: Analytics starts hanging — perhaps a downstream data warehouse is slow. Within seconds, analytics calls consume the shared 100-thread pool. Recommendations and payments calls find no threads. Payments — the most important call — returns 503 to users even though the payments backend is perfectly healthy.

With bulkheads:

Pool	Size	Behavior under analytics failure
Payments pool	30 threads	3 threads busy. Unaffected. Payments succeed.
Recommendations pool	20 threads	5 threads busy. Unaffected. Recommendations succeed.
Analytics pool	10 threads	10/10 threads exhausted. New analytics calls rejected immediately with a non-blocking error.

Payments and recommendations continue serving users. Analytics calls are dropped (acceptable for fire-and-forget). An alert fires on analytics pool exhaustion, prompting investigation of the data warehouse.

Bulkhead + Circuit Breaker

Bulkhead and circuit breaker are complementary, not alternatives.

Pattern	Protects against	Mechanism
Bulkhead	Resource exhaustion from slow dependencies	Limits concurrent calls per dependency
Circuit Breaker	Cascading failures from failing dependencies	Stops calling a failing dependency

Without a circuit breaker, a slow dependency still consumes its entire bulkhead partition one slot at a time. Calls drip in, each blocking for the timeout duration. The partition stays at or near capacity, increasing latency for queued callers.

With a circuit breaker added: once the failure rate threshold is reached, the breaker opens and new calls are rejected immediately without entering the bulkhead partition at all. The partition drains and returns to idle.

Recommended layering (inner to outer):

Timeout (BEE-12002) — Bound how long any single call can block.
Bulkhead — Bound how many concurrent calls can be in-flight to each dependency.
Circuit Breaker (BEE-12001) — Stop calling dependencies that are failing or saturated.

Sizing Bulkhead Partitions

Partition sizing is the most important operational decision. There is no universal formula, but here is a practical starting approach.

For thread pool bulkheads:

pool_size = (throughput_rps × p99_latency_seconds) × safety_factor

throughput_rps: expected requests per second routed to this dependency
p99_latency_seconds: 99th-percentile response time of the dependency under normal conditions
safety_factor: multiply by 1.5–2.0 to absorb bursts

Example: 100 RPS to payments, p99 latency 50ms (0.05s), safety factor 2.0: 100 × 0.05 × 2.0 = 10 threads

For semaphore bulkheads:

Start with the same formula. Semaphore permits can be smaller than thread pool sizes since permit acquisition is non-blocking when the call itself is asynchronous.

Guidance:

Dependency	Suggested starting point
Critical, low-latency (payments, auth)	Larger partition — prioritize availability
High-volume, medium-latency (product catalog)	Medium partition sized to peak load
Non-critical, variable latency (recommendations)	Smaller partition; degraded service acceptable
Fire-and-forget (analytics, audit logging)	Smallest partition; drops acceptable

Revisit partition sizes regularly using observed p99 latency and actual concurrency metrics.

Monitoring Partition Utilization

A bulkhead that is silently saturated provides no protection — calls queue, latency spikes, and the partition eventually behaves like no bulkhead at all.

Metrics to expose per partition:

Metric	Description	Alert threshold
`bulkhead.active`	Current in-flight calls	—
`bulkhead.queue_depth`	Calls waiting for a permit/thread	Alert if sustained > 0
`bulkhead.rejected_total`	Calls rejected due to full partition	Alert on any increase
`bulkhead.utilization`	`active / max_concurrent` as a percentage	Alert if > 70% sustained
`bulkhead.latency`	End-to-end call latency through the partition	Baseline + alert on spikes

A sustained bulkhead.utilization above 70% is an early warning that the partition needs resizing or that the downstream dependency is degrading. Rejection events (rejected_total increasing) mean the partition is already full — upstream callers are receiving errors right now.

Resilience4j Reference

Resilience4j (resilience4j.readme.io/docs/bulkhead) provides two bulkhead implementations for JVM services.

SemaphoreBulkhead:

java

BulkheadConfig config = BulkheadConfig.custom()
    .maxConcurrentCalls(25)       // max concurrent permits
    .maxWaitDuration(Duration.ofMillis(50))  // max time to wait for a permit
    .build();

Bulkhead bulkhead = Bulkhead.of("payments", config);

// Wrap the call
CheckedFunction0<String> decorated = Bulkhead
    .decorateCheckedSupplier(bulkhead, () -> paymentsClient.charge(amount));

ThreadPoolBulkhead:

java

ThreadPoolBulkheadConfig config = ThreadPoolBulkheadConfig.custom()
    .coreThreadPoolSize(5)
    .maxThreadPoolSize(10)
    .queueCapacity(20)
    .keepAliveDuration(Duration.ofMillis(20))
    .build();

ThreadPoolBulkhead bulkhead = ThreadPoolBulkhead.of("analytics", config);

// Wrap the async call
Supplier<CompletionStage<String>> decorated = ThreadPoolBulkhead
    .decorateSupplier(bulkhead, () -> analyticsClient.record(event));

Choose SemaphoreBulkhead for reactive/non-blocking code. Choose ThreadPoolBulkhead for blocking I/O calls where you need true thread pool isolation.

For .NET, the Polly library provides BulkheadPolicy with equivalent semantics.

Common Mistakes

1. Single shared thread pool for all dependencies

This is the default configuration in many frameworks. Without explicit per-dependency pools, all outbound calls compete for the same threads. Configuring bulkheads requires deliberate effort; the default is no isolation.

2. Bulkhead partitions sized too large

A partition sized to 90% of the total thread pool provides almost no isolation. If Analytics gets 80 of 100 threads, it can still starve every other dependency. Size partitions conservatively; accept that non-critical dependencies will drop calls under load rather than borrow capacity from critical ones.

3. Not monitoring partition exhaustion

A full partition is a silent incident. Without alerting on rejected_total or utilization, engineers learn about partition exhaustion only when users complain. Wire all bulkhead metrics to your observability stack before deploying to production.

4. Bulkhead without timeouts or circuit breaker

A bulkhead limits how many calls can be in-flight simultaneously, but does nothing about how long they wait. Slow calls still hold partition slots until they complete or time out. Without timeouts, partition slots drain slowly and stay occupied. Without a circuit breaker, a failing dependency keeps receiving calls one slot at a time until the partition saturates. Always combine the three patterns.

5. Over-partitioning

Creating a separate thread pool for every individual endpoint or operation wastes resources and increases complexity. At the extreme, 50 pools of 2 threads each is less effective than 10 pools of 10 threads. Group dependencies by failure domain and criticality class, not by individual endpoint.

BEE-11002 (Worker Pools) — foundational concepts for sizing and managing thread pools
BEE-12001 (Circuit Breaker Pattern) — stop calling failed dependencies; combine with bulkheads for full protection
BEE-12002 (Timeouts and Deadlines) — bound call duration so partition slots are not held indefinitely
BEE-12002 (Graceful Degradation) — define fallback behavior when a bulkhead partition is full

References

Michael Nygard, Release It! Design and Deploy Production-Ready Software, 2nd ed., Pragmatic Programmers (2018) — Chapter 4: Stability Patterns
Microsoft Azure Architecture Center, Bulkhead Pattern, learn.microsoft.com/en-us/azure/architecture/patterns/bulkhead
Resilience4j documentation, Bulkhead, resilience4j.readme.io/docs/bulkhead
Netflix Hystrix, How It Works — Thread Isolation, github.com/Netflix/Hystrix/wiki/How-it-Works

[BEE-12004] Bulkhead Pattern ​

Context ​

Principle ​

Isolation Mechanisms ​

Thread Pool Isolation ​

Semaphore Isolation ​

Connection Pool Isolation ​

Process-Level Isolation ​

The Problem Without Bulkheads ​

The Solution With Bulkheads ​

Worked Example ​

Bulkhead + Circuit Breaker ​

Sizing Bulkhead Partitions ​

Monitoring Partition Utilization ​

Resilience4j Reference ​

Common Mistakes ​

1. Single shared thread pool for all dependencies ​

2. Bulkhead partitions sized too large ​

3. Not monitoring partition exhaustion ​

4. Bulkhead without timeouts or circuit breaker ​

5. Over-partitioning ​

Related BEPs ​

References ​