Multi-Tenant Isolation Patterns
Pooled vs siloed vs cells: picking an isolation model that matches your blast-radius budget, not the hype cycle.
The right isolation model depends on three numbers: how expensive an incident is, how big your largest tenant is, and how many small tenants you have.
The three models
- Pooled — one cluster, shared DB, tenant id on every row. Cheapest, fastest to ship, most blast radius.
- Siloed — one deployment per tenant. Best isolation, worst economics, operational nightmare at scale.
- Cells — a pool of small, identical deployments, each owning a subset of tenants. The pragmatic middle.
Most SaaS companies end up with cells, even if they don't call them that.
When to pick cells
You want cells when:
- A single noisy tenant can hurt everyone else.
- Big customers want "dedicated infrastructure" and you'd rather not fork the product.
- Regional or residency requirements make one giant cluster untenable.
Cells contain blast radius, enable tenant moves, and let you run your biggest customer on a dedicated cell without forking anything.
The tenant move
Moving a tenant between cells is the single most valuable operational capability in this architecture. Dual-writes → backfill → cutover. Practice it in staging; script it for production.
Don't overbuild isolation before you need it. Premature sharding is a well-known graveyard.
Inside a cell
Within a cell, Postgres row-level security is usually enough. Test policies in CI. Tenant-aware observability is non-negotiable: every log, metric, and trace tagged with tenant_id. Anything else and on-call can't do their job.
What I pick
Start pooled with tenant_id + RLS. When you hit the first serious noisy-neighbor incident (you will), move to cells and build the tenant-move tooling before you need it the second time.