Skip to content
DEE
Database Engineering Essentials

[DEE-405] Choosing the Right NoSQL Type

INFO

Choose a NoSQL database type based on your data model and access patterns, not on popularity or hype. Each type -- document, key-value, column-family, and graph -- is optimized for a specific category of workload.

Context

The NoSQL landscape offers four primary database types, each designed around a different data model and access pattern:

  • Document stores (MongoDB, Couchbase) -- flexible schema, nested data, rich queries within documents.
  • Key-value stores (Redis, DynamoDB, Memcached) -- simple lookups by key, extreme throughput and low latency.
  • Column-family stores (Cassandra, ScyllaDB, HBase) -- massive write throughput, time-series data, horizontal scaling across data centers.
  • Graph databases (Neo4j, Neptune, TigerGraph) -- relationship-heavy data, variable-depth traversals, pathfinding.

Choosing the wrong type creates friction that no amount of application-layer workaround can resolve. Forcing relationship traversals into a document store means building graph algorithms in application code. Forcing flexible schema requirements into a column-family store means fighting CQL's rigid query model. The cost of choosing wrong is high because migrating between NoSQL types often requires a complete data model redesign.

The concept of polyglot persistence -- using multiple database types within a single system, each for the workload it serves best -- is a well-established pattern. A modern application might use Redis for caching and sessions, MongoDB for the product catalog, Cassandra for event logging, and Neo4j for recommendation queries.

Principle

  • You MUST map each data access pattern to the database type that natively supports it before committing to a technology.
  • You SHOULD consider polyglot persistence when a single database type cannot efficiently serve all access patterns.
  • You MUST NOT choose a NoSQL database based solely on popularity, team familiarity, or vendor marketing without evaluating the data model fit.
  • You SHOULD evaluate operational complexity (backup, monitoring, scaling, team expertise) as a first-class selection criterion alongside technical fit.
  • You MUST NOT assume that any single NoSQL database is the right choice for all workloads. "One-size-fits-all" thinking is the most common source of NoSQL project failures.

Visual

Comparison Table

DimensionDocument StoreKey-Value StoreColumn-Family StoreGraph Database
Data modelNested JSON/BSON documentsOpaque key-value pairs (some stores offer structured values)Rows with partition key, clustering columns, and sparse columnsNodes with properties, directed typed relationships
Query flexibilityRich: filters, projections, aggregations within documentsMinimal: lookup by key (DynamoDB adds sort key + secondary indexes)Moderate: partition key required, range on clustering columnsRich: pattern matching, variable-depth traversal, pathfinding
Schema flexibilityHigh: each document can have a different structureHigh: values are opaque to the storeLow: table schema is rigid once defined (query-first design)Moderate: node/relationship types are flexible, but indexes must be explicit
Write throughputModerate to high (10K-100K ops/sec per node)Very high (100K+ ops/sec per node)Very high (100K+ sustained writes per node, linear scaling)Moderate (writes involve maintaining adjacency lists)
Read patternSingle-document reads, rich queriesSingle-key lookups, O(1)Single-partition reads, range scans within partitionTraversals: variable-depth, shortest path, pattern matching
Horizontal scalingSharding (manual or auto)Partitioning (built-in for DynamoDB, manual for Redis Cluster)Linear scaling by adding nodes (native to design)Limited (most graph DBs scale reads via replicas, not writes via sharding)
Consistency modelTunable (MongoDB: read/write concern)Varies (Redis: eventual with replication; DynamoDB: strong or eventual per-read)Tunable per-query (Cassandra: consistency levels)Typically ACID per-transaction (Neo4j)
Best forProduct catalogs, content management, user profiles, APIs with variable payloadsCaching, sessions, leaderboards, rate limiting, feature flagsEvent logging, IoT time-series, audit trails, messaging systemsSocial networks, recommendations, fraud detection, knowledge graphs, access control

Example Scenarios

ScenarioRecommended TypeWhy
E-commerce product catalog with variable attributes per categoryDocument (MongoDB)Products in different categories have different fields (clothing has size/color; electronics has specs). Document schema flexibility handles this naturally.
User session management for a web application with 100K concurrent usersKey-Value (Redis)Sessions are accessed by session ID, expire after inactivity, and require sub-millisecond latency. Pure key-value access pattern.
IoT sensor data: 50,000 sensors reporting every secondColumn-Family (Cassandra)50K writes/sec sustained, time-series data, partition by sensor + time bucket, range queries on timestamp. Cassandra's write-optimized LSM-tree storage is built for this.
Social network "people you may know" featureGraph (Neo4j)Friends-of-friends at variable depth. A 3-hop traversal in Neo4j takes milliseconds; the same query in SQL requires 3 self-joins on a billion-row table.
Configuration key-value store for distributed servicesKey-Value (etcd / Consul)Simple key lookup with strong consistency and watch/subscribe for changes. No need for query flexibility.
Real-time fraud detection (unusual transaction patterns across accounts)Graph (Neo4j / TigerGraph)Detecting circular money flows, shell company networks, or unusual connection patterns requires traversing a graph of accounts, transactions, and entities.
Event sourcing / audit trail for a financial systemColumn-Family (Cassandra)Append-only writes, immutable records, time-ordered retrieval, multi-datacenter replication for compliance.
REST API backend with moderate traffic and evolving schemaDocument (MongoDB)Schema flexibility for rapid iteration, rich query support for API filtering/pagination, good developer experience with JSON-native storage.

Common Mistakes

MistakeWhy It HurtsFix
One-size-fits-all thinking -- choosing one NoSQL database for every workloadForces workloads into data models that don't fit. Results in application-layer workarounds that are slower, buggier, and harder to maintain.Map each access pattern to the best-fit database type. Embrace polyglot persistence where justified.
Choosing based on popularity -- "MongoDB is the most popular NoSQL database, so we'll use it for everything"Popularity does not mean fit. Using MongoDB for time-series at 100K writes/sec, or for deep graph traversals, will underperform dedicated solutions.Evaluate based on data model and access pattern, not market share.
Ignoring operational complexity -- choosing a technology the team cannot operateA database that requires specialized operations knowledge (Cassandra ring management, Neo4j memory tuning) will cause outages if the team lacks expertise.Factor in team expertise, managed service availability (e.g., DynamoDB, Amazon Neptune, Atlas), monitoring tools, and backup/restore complexity.
Premature polyglot persistence -- using 5 databases for a simple applicationEach database adds operational overhead: monitoring, backups, failover, team training. For small teams and simple workloads, one well-chosen database is better.Start with one database that covers most access patterns. Add specialized databases only when a specific workload clearly outgrows the primary store.
Ignoring the relational option -- assuming NoSQL is always better for modern applicationsMany workloads (transactional, strongly consistent, ad-hoc query heavy) are best served by PostgreSQL or MySQL. NoSQL is not an upgrade from relational; it is a different tool for different problems.Always include a relational database in the evaluation. If your data is structured, your queries are ad-hoc, and your scale is moderate, relational may be the best choice.

References