Skip to content
BEE
Backend Engineering Essentials

[BEE-6006] Connection Pooling and Query Optimization

INFO

Reduce database overhead through connection reuse and eliminate inefficient query patterns before they reach production.

Context

Every database connection has real cost. PostgreSQL spawns a new backend process per connection — typically 5–10 MB of memory each, plus authentication handshake time on every new TCP connection. At low traffic this is imperceptible; at scale it becomes a hard ceiling. A service receiving 500 requests per second that opens a fresh connection for each request will exhaust max_connections (default 100 in PostgreSQL) almost immediately, causing connection refused errors under load.

Query efficiency compounds the problem. An application that issues 101 queries to fetch what could be done in 1 is burning CPU, network round-trips, and connection hold time on every request. Both problems — connection overhead and query inefficiency — are preventable with known patterns.

References:

Principle

Reuse connections through a pool sized to your actual concurrency, and design queries that do exactly one job — no more, no less.

Connection Pooling

Why pooling matters

Establishing a database connection involves:

  1. TCP three-way handshake
  2. SSL/TLS negotiation (if enabled)
  3. PostgreSQL authentication (md5, scram-sha-256, etc.)
  4. Backend process fork on the server

This overhead is typically 5–20 ms. At 500 req/s, spending 10 ms per connection establishment burns 5 seconds of latency budget per second — before any query runs.

A connection pool maintains a set of already-established connections and hands them out to application threads on demand, returning them when the transaction (or session) completes.

Pool sizing: Little's Law

The correct pool size is not "as large as possible." Oversizing a pool overwhelms the database with concurrent backend processes competing for CPU and shared buffer locks, degrading throughput.

Little's Law applied to connection pools:

pool_size = avg_concurrent_queries * avg_query_time_in_seconds

Example: if your service has 20 concurrent in-flight queries at any given time and each query takes 50 ms on average:

pool_size = 20 * 0.05 = 1 connection

In practice, add headroom for spikes. A reasonable starting formula:

pool_size = (num_cpu_cores * 2) + effective_spindle_count

This is the guideline used by HikariCP (Java) and recommended by the pgBouncer documentation. For most web services, a pool of 10–20 connections per application instance is sufficient.

Connection pool modes

PgBouncer (the most widely deployed external PostgreSQL pooler) supports three modes:

ModeServer connection held forUse case
SessionDuration of client sessionApplications that use session-level features (temp tables, advisory locks, SET statements)
TransactionDuration of one transactionMost web applications — highest connection reuse
StatementDuration of one statementEnforces autocommit; breaks multi-statement transactions; rarely used

Transaction mode is the sweet spot for HTTP services. The server connection is returned to the pool the moment COMMIT or ROLLBACK completes, so a pool of 20 server connections can serve hundreds of concurrent application connections.

Caution: Transaction mode breaks session-level PostgreSQL features. You cannot use SET LOCAL that persists across transactions, prepared statements (by name), advisory locks held across transactions, or LISTEN/NOTIFY in transaction mode.

External poolers vs. application-level pools

Application-level pool (e.g., HikariCP, pgx pool)External pooler (PgBouncer)
ScopePer application processShared across all processes/pods
OverheadLow (in-process)Slight network hop
Connection countinstances * pool_size connections to DBFixed server-side connections regardless of instance count
Best forSingle-process services, microservices with low instance countHigh-instance-count deployments, serverless, languages without good pool libraries

In Kubernetes environments with many pod replicas, an external pooler is often essential. 50 pods each with a 10-connection pool = 500 database connections, which can exceed max_connections and destabilize the cluster.

Connection pool lifecycle

The N+1 Query Problem

Definition

The N+1 problem occurs when code executes 1 query to fetch a list of N records, then executes N additional queries to fetch related data for each record — totaling N+1 round-trips to the database.

Concrete example

Scenario: Display 100 orders with each customer's name.

Naive ORM code (pseudo-code):

python
orders = db.query("SELECT id, customer_id, total FROM orders LIMIT 100")
# 1 query

for order in orders:
    customer = db.query("SELECT name FROM customers WHERE id = ?", order.customer_id)
    # 1 query * 100 orders = 100 queries
    print(f"{order.id}: {customer.name} — ${order.total}")

Query count: 1 + 100 = 101 queries

The fix — JOIN:

sql
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
LIMIT 100;

Query count: 1 query

Alternative fix — IN clause batch load:

sql
-- Step 1: fetch orders (1 query)
SELECT id, customer_id, total FROM orders LIMIT 100;

-- Step 2: fetch all related customers in one query (1 query)
SELECT id, name FROM customers
WHERE id IN (42, 17, 88, ...);  -- all 100 customer IDs

Query count: 2 queries

The IN-clause pattern (sometimes called the DataLoader pattern, popularized by Facebook's GraphQL DataLoader) is useful when a JOIN is not practical — for example when the two entities come from different services or databases.

Detection

N+1 queries are often invisible during development (small datasets, fast local DB) but catastrophic in production. Detection methods:

  • ORM query logging: Enable SQL logging and scan for repeated query patterns with incrementing IDs.
  • APM tools: Scout APM, Datadog APM, New Relic flag N+1 patterns automatically.
  • Slow query log: If individual queries are fast but count is high, check for repeated patterns.
  • Bullet gem (Rails): Notifies in development when N+1 or unused eager loads are detected.

Reading EXPLAIN Plans

Before optimizing a query, read its execution plan:

sql
EXPLAIN ANALYZE
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at > NOW() - INTERVAL '30 days';

Key nodes to understand:

Node typeMeaning
Seq ScanFull table scan — acceptable on small tables, a warning sign on large ones
Index ScanUses an index to find rows; generally efficient
Index Only ScanAll needed columns are in the index; fastest possible read
Hash Join / Nested LoopJoin algorithms; Nested Loop with an index on the inner table is often optimal
SortRequires sorting; can be eliminated by an index on the ORDER BY column

Look at rows (estimated vs. actual) and cost values. A large discrepancy between estimated and actual rows indicates stale statistics — run ANALYZE on the table.

Deep Dive

For database-level query optimization and execution plan analysis, see DEE Query and Performance series.

Query Optimization Strategies

1. Avoid SELECT *

sql
-- Bad: fetches all columns, prevents Index Only Scan, increases network transfer
SELECT * FROM users WHERE id = 42;

-- Good: fetch only what you need
SELECT id, email, display_name FROM users WHERE id = 42;

SELECT * prevents the optimizer from using covering indexes. It also transfers unnecessary data across the network and into application memory.

2. Push filtering to the database

sql
-- Bad: fetch all, filter in application
orders = db.query("SELECT * FROM orders")
recent = [o for o in orders if o.created_at > thirty_days_ago]

-- Good: filter in SQL
SELECT id, total FROM orders
WHERE created_at > NOW() - INTERVAL '30 days';

The database can use indexes on the filter column. The application layer cannot.

3. Use indexes on filter and join columns

Columns that appear in WHERE, JOIN ON, and ORDER BY clauses are candidates for indexing. See BEE-6002 for indexing strategy.

4. Limit result sets

sql
-- Always paginate large result sets
SELECT id, title FROM articles
ORDER BY published_at DESC
LIMIT 20 OFFSET 0;

Use keyset pagination (WHERE id < last_seen_id) over offset pagination for large datasets — OFFSET 10000 still requires the database to scan 10,000 rows before discarding them.

5. Use query parameterization

sql
-- Good: parameterized query (plan can be cached)
SELECT * FROM users WHERE email = $1;

-- Bad: string interpolation (new plan for every unique email)
SELECT * FROM users WHERE email = 'user@example.com';

Parameterized queries allow the database to cache execution plans, reducing planning overhead on repeated queries.

Common Mistakes

MistakeConsequenceFix
New connection per request (no pooling)Connection overhead dominates response time; DB runs out of connectionsAdd application-level pool or external pooler
Pool too largeDB overwhelmed with concurrent backends; lock contention increasesSize pool with Little's Law; start with (cores * 2) + spindles
Not returning connections to poolConnection leak; pool exhausted; application hangsUse try/finally or connection context managers; enable pool timeout alerts
N+1 queries in ORM code100x–1000x more queries than necessaryEager load with JOIN or batch with IN clause
SELECT * when only 2 columns neededPrevents covering indexes; excess data transferSelect explicit columns
  • BEE-6002 — Database Indexing Strategies: which columns to index and how
  • BEE-13002 — Resource Management: connection lifecycle and cleanup patterns
  • BEE-13004 — Profiling and Observability: detecting slow queries and N+1 in production