[DEE-503] Repository Pattern
INFO
Use the repository pattern to decouple business logic from data access details. The repository provides a collection-like interface over the persistence layer, making domain logic testable and data access swappable.
Context
In most applications, business logic and database queries are intertwined. A service that calculates order totals also constructs SQL queries, handles pagination, and manages transactions. This coupling makes the business logic harder to test (tests need a database), harder to change (switching databases requires rewriting business logic), and harder to understand (domain rules are buried in query construction).
The repository pattern, introduced by Martin Fowler in Patterns of Enterprise Application Architecture and central to Domain-Driven Design (DDD), addresses this by providing an abstraction that behaves like an in-memory collection of domain objects. The business logic asks the repository for objects and saves objects back to it, without knowing whether the underlying store is PostgreSQL, MongoDB, or an in-memory test double.
In DDD, repositories exist only for aggregate roots -- not for every table or entity. An OrderRepository manages Order aggregates; individual OrderItem rows are accessed through their parent Order, not through a separate repository. This preserves aggregate boundaries and transactional consistency.
Principle
- Teams SHOULD use the repository pattern when business logic must be testable without a database, or when data access may change independently of domain logic.
- Repositories MUST expose domain-oriented interfaces (e.g.,
findActiveOrdersByCustomer), not data-oriented interfaces (e.g.,query(sql)orfindBy(column, value)). - In DDD contexts, repositories SHOULD be created only for aggregate roots, not for every entity or table.
- Teams SHOULD NOT use the repository pattern for simple CRUD applications where the ORM already provides sufficient abstraction and testability.
- Repository interfaces MUST NOT expose implementation details such as query builders, ORM sessions, or SQL fragments.
Visual
Key insight: The domain service depends on the repository interface (defined in the domain layer). The actual implementation (using an ORM or raw SQL) lives in the infrastructure layer. For testing, a simple in-memory implementation replaces the real one.
Example
Repository Interface (Language-Agnostic)
interface OrderRepository:
find(id: OrderId) -> Order | null
findByCustomer(customerId: CustomerId) -> List<Order>
findActiveByDateRange(start: Date, end: Date) -> List<Order>
save(order: Order) -> void
delete(order: Order) -> voidImplementation with SQL
# Python / SQLAlchemy implementation
class SqlOrderRepository(OrderRepository):
def __init__(self, session: Session):
self._session = session
def find(self, order_id: OrderId) -> Order | None:
return self._session.get(OrderModel, order_id.value)
def find_by_customer(self, customer_id: CustomerId) -> list[Order]:
stmt = (
select(OrderModel)
.where(OrderModel.customer_id == customer_id.value)
.order_by(OrderModel.created_at.desc())
)
return list(self._session.scalars(stmt))
def save(self, order: Order) -> None:
self._session.merge(order.to_model())
def delete(self, order: Order) -> None:
model = self._session.get(OrderModel, order.id.value)
if model:
self._session.delete(model)In-Memory Test Double
class InMemoryOrderRepository(OrderRepository):
def __init__(self):
self._orders: dict[OrderId, Order] = {}
def find(self, order_id: OrderId) -> Order | None:
return self._orders.get(order_id)
def find_by_customer(self, customer_id: CustomerId) -> list[Order]:
return [
o for o in self._orders.values()
if o.customer_id == customer_id
]
def save(self, order: Order) -> None:
self._orders[order.id] = order
def delete(self, order: Order) -> None:
self._orders.pop(order.id, None)Using the Repository in a Service
class OrderService:
def __init__(self, orders: OrderRepository):
self._orders = orders
def cancel_order(self, order_id: OrderId) -> None:
order = self._orders.find(order_id)
if order is None:
raise OrderNotFound(order_id)
order.cancel() # Domain logic on the aggregate
self._orders.save(order)
# Production
service = OrderService(SqlOrderRepository(db_session))
# Test
service = OrderService(InMemoryOrderRepository())When NOT to Use the Repository Pattern
| Scenario | Use Repository? | Why |
|---|---|---|
| Simple CRUD API (no complex domain logic) | No | The ORM already acts as the repository |
| Small project / prototype | No | Over-abstraction slows development |
| Complex domain with multiple aggregates | Yes | Testability and boundary enforcement |
| Need to swap data stores (SQL -> NoSQL) | Yes | Abstraction makes the switch possible |
| Multiple read models (CQRS) | Yes (for write side) | Separates command and query responsibilities |
| Microservice with one entity | Maybe | Depends on testing requirements |
Common Mistakes
Leaky abstractions. Exposing ORM query builders,
IQueryable, or SQL fragments through the repository interface defeats the purpose. If callers construct queries, the repository is not abstracting anything. The interface should expose domain operations (findActiveOrders), not generic query capabilities (findWhere(predicate)).Repository that is just a pass-through. If every repository method is a one-line delegation to the ORM (
find(id)callssession.get(id)), the repository adds indirection without value. This is a sign that either the domain is simple enough to not need the pattern, or the repository methods are too generic. Add value through domain-specific query methods, encapsulated transaction boundaries, or aggregate reconstitution logic.One repository per table. In DDD, repositories exist for aggregate roots only. Creating
OrderRepository,OrderItemRepository, andOrderStatusHistoryRepositoryfor a single aggregate breaks encapsulation.OrderItemshould be accessed only through theOrderaggregate and its repository.Over-abstraction in simple applications. A CRUD API that maps HTTP endpoints directly to database tables does not benefit from the repository pattern. The ORM's built-in querying is sufficient. Adding a repository layer, a service layer, and an interface layer for a simple TODO app creates maintenance overhead without testability benefits.
Putting business logic in the repository. The repository's job is data access, not domain rules. Validation, state transitions, and business calculations belong in the domain model or service layer. A
cancel_ordermethod in the repository that checks cancellation rules is a misplaced responsibility.
Related DEEs
- DEE-500 Application Patterns Overview
- DEE-502 ORM Pitfalls and Best Practices -- the data access layer repositories wrap
- DEE-504 Multi-Tenancy Data Isolation -- repositories can encapsulate tenant filtering
References
- Martin Fowler: Repository Pattern -- original pattern definition from Patterns of Enterprise Application Architecture
- Microsoft: Designing the Infrastructure Persistence Layer -- repository pattern in DDD microservices
- DevIQ: Repository Pattern -- concise explanation with implementation guidance
- Eric Evans: Domain-Driven Design -- the foundational text on repositories in DDD context
- Vaughn Vernon: Implementing Domain-Driven Design -- repository vs DAO distinction