Case study
Multi-tenant Postgres at elizaOS Cloud — Row-Level Security at scale
Eliza Labs · 6 min read
Context
elizaOS is the leading open-source AI agent framework — 18k+ stars on GitHub. elizaOS Cloud is the managed offering: 10 000+ users managing their AI agents, LLM keys and integrations, all on a shared Postgres backend.
When I started architecting the multi-tenant layer, the platform was
mid-pivot. Some entities were correctly scoped per user. Others leaked
across tenants if you held the right query. Application-level filtering
was inconsistent — every developer added a WHERE owner_id = $current
predicate to every query, and the day someone forgot, the bug was a
data leak.
Decision
Push isolation into Postgres, not the app layer.
Use Row-Level Security with policies tied to a per-request session-local variable carrying the current entity’s ID. Every query the app executes is automatically constrained by the policy — no extra predicate to remember, no way to forget.
This is the boring, correct answer for multi-tenant Postgres. The interesting work was making it not become an operational nightmare.
What I built
Entity-level RLS, not just user-level
elizaOS has entities as a first-class concept (users, agents, worlds). Per-user RLS would have been too coarse — an agent acting on behalf of a user needs to read the user’s data, but only that data. We modeled policies at the entity level, with policies allowing delegation when an agent’s owner is the current user.
Encryption for character secrets
Characters (the personas an agent embodies) carry secrets — API keys, private notes, role-specific instructions. These needed encryption at rest, not just RLS protection. I added a transparent encryption layer that encrypted on write and decrypted on read inside the data access path, so the rest of the codebase didn’t have to think about it.
The encryption layer was wired through the same connection that set the RLS context — meaning: leak-proof by construction, since neither the wrong tenant nor the wrong code path can decrypt what they shouldn’t.
Migration discipline for pre-1.6.5 data
A non-trivial portion of the user base was on pre-1.6.5 schema — where some entities didn’t have the right ownership columns yet. The migration had to:
- Backfill
owner_idwith the best available signal per entity type. - Apply RLS policies in a mode that allowed the backfilled rows through during transition.
- Switch to strict policies once the backfill was verified across the whole dataset.
Done as a multi-step rollout, monitored with row-count checks at each step, with an explicit rollback path that didn’t require a downtime window.
Outcome
- Tenant isolation enforced in the database, not the application. No more “the dev forgot the WHERE clause” bug class.
- Character secrets encrypted at rest, transparent to the rest of the codebase.
- pre-1.6.5 data successfully migrated without forcing a flag day or a service window.
- The pattern was packaged so future entity types automatically inherit the right policies — RLS doesn’t have to be a per-feature ask.
What I’d do differently
I’d have invested earlier in policy testing as code. The first version had RLS policies that were correct but only verified manually. We later added a test suite that connects as different tenants and asserts what each one can / cannot see. That’s how you avoid silent regressions when a teammate edits a policy six months later.