Multi-Tenant Data Isolation
Spotlight is a multi-tenant platform where each tenant's data must be completely isolated from all other tenants. This is achieved through DynamoDB partition key isolation, tenant-scoped queries at the application layer, and infrastructure-level controls.
Isolation Model
Every DynamoDB table in the system uses the tenant_id as part of its partition key structure. This ensures that DynamoDB's internal data distribution physically separates tenant data across different partitions:
+-----------------------------------+
| Spotlight.Tours.Definitions |
| |
| Partition: tenant_abc |
| tour_001, tour_002, tour_003 |
| |
| Partition: tenant_xyz |
| tour_100, tour_101 |
| |
| (separate partitions, no overlap)|
+-----------------------------------+Partition Key Design
Direct Tenant Partitioning
Tables where tenant_id is the partition key:
| Table | PK | SK |
|---|---|---|
Tenants.Config | tenant_id | -- |
Tours.Definitions | tenant_id | tour_id |
Content.Definitions | tenant_id | content_id |
Content.Checklists | tenant_id | checklist_id |
Audiences.Rules | tenant_id | audience_id |
Themes.Definitions | tenant_id | theme_id |
Surveys.Definitions | tenant_id | survey_id |
Admin.Users | tenant_id | user_id |
Audit.AdminActions | tenant_id | timestamp_event_id |
Tenants.ApiKeys | tenant_id | api_key_prefix |
With tenant_id as the partition key, a DynamoDB Query operation on one tenant cannot return results from another tenant. This is enforced by DynamoDB at the storage layer.
Composite Key Partitioning
Tables that embed tenant_id in a composite partition key:
| Table | PK Format | SK |
|---|---|---|
Tours.Versions | tenant_abc#tour_456 | version (N) |
Progress.UserState | tenant_abc#user_123 | content_id |
Events.Interactions | tenant_abc#tour_456 | timestamp#event_id |
Events.Aggregates | tenant_abc#tour_456 | date#metric |
Surveys.Responses | tenant_abc#survey_789 | user_timestamp |
Activity.UserEvents | tenant_abc#user_123 | timestamp_event_id |
The composite key format {tenant_id}#{entity_id} ensures that queries are always scoped to a single tenant. Even if an attacker discovers another tenant's entity_id, they cannot access it because the partition key would resolve to {their_tenant_id}#{entity_id} -- a different partition entirely.
API Keys
The Tenants.ApiKeys table is partitioned on tenant_id with api_key_prefix as the sort key. The full plaintext key format is sk_<env>_<32-hex-tenant-uuid>_<random>, so the validation path extracts the tenant UUID from the key itself and issues a direct GetItem against (tenant_id, api_key_prefix). No cross-tenant scan is ever required, and a bearer presenting a key can only ever touch their own partition.
Cross-tenant exception: Platform.Memberships
There is exactly one table where a row's partition key is not tenant_id: Spotlight.Platform.Memberships. It maps a Clerk platform user (platform_user_id) to the tenants they can switch into, and is read before a tenant is chosen — so partitioning by tenant_id would make the "which tenants am I on?" lookup impossible.
- PK:
platform_user_id - SK:
tenant_id - Attributes:
role,created_at
The trust model is:
- Only
/v1/platform/tenantsreads this table, and only with a validated platform JWT. The caller'ssubclaim is pinned as the partition key — a user can only ever list their own memberships. - Once a tenant is picked, the regular
AuthContext.tenant_idtakes over and every downstream repo call is tenant-scoped as usual. - Row creation happens only through super-admin tenant-create, invite-accept, or the seed script. There is no user-facing endpoint that can mint a membership outside those flows.
This table intentionally sits outside the tenant-isolation invariant because it's the pivot that makes multi-tenant admin possible in the first place. Treat any new code that reads it with the same care as authentication itself.
Application-Layer Enforcement
Tenant Context Injection
After authentication, the tenant ID is injected into every route handler via the AuthContext dependency:
@router.get("/{tour_id}")
async def get_tour(
tour_id: str,
auth: AuthContext = Depends(require_admin),
tour_repo: TourRepository = Depends(get_tour_repo),
):
# auth.tenant_id is set by the authentication middleware
# The repository uses it to scope the query
tour = await tour_repo.get(auth.tenant_id, tour_id)The tenant_id comes from the authenticated API key -- never from user input. There is no X-Tenant-Id header or query parameter that allows callers to specify a different tenant.
Repository Pattern
All repository methods require tenant_id as the first parameter and construct DynamoDB keys using it:
class DynamoDBContentRepository:
async def get(self, tenant_id: str, content_id: str) -> Content | None:
resp = await asyncio.to_thread(
self._client.get_item,
TableName=self._table,
Key={
"tenant_id": {"S": tenant_id},
"content_id": {"S": content_id},
},
)
item = resp.get("Item")
# ...
async def list_by_tenant(self, tenant_id: str, limit: int = 20) -> list[Content]:
resp = await asyncio.to_thread(
self._client.query,
TableName=self._table,
KeyConditionExpression="tenant_id = :tid",
ExpressionAttributeValues={
":tid": {"S": tenant_id},
},
Limit=limit,
)
# ...This design makes cross-tenant data access structurally impossible -- there is no code path that queries DynamoDB without a tenant-scoped key.
No Cross-Tenant Queries
The system never performs table scans or queries that span multiple tenants. Every DynamoDB operation is one of:
GetItemwith a key containingtenant_id.Querywithtenant_idas the partition key expression.PutItem/UpdateItem/DeleteItemwith a key containingtenant_id.TransactWriteItemswhere all items includetenant_id.
No table scans
DynamoDB Scan operations are prohibited in production code. A scan would read all items across all tenants, violating data isolation. The only exception is the local development seed script.
Global Secondary Index Isolation
GSIs follow the same tenant-scoped pattern:
# Tours.Definitions: query by status within a tenant
global_secondary_index {
name = "gsi-tenant-status"
hash_key = "tenant_status" # "tenant_abc#published"
range_key = "updated_at"
}The GSI partition key is a composite of tenant_id and status, so querying tenant_abc#published cannot return results from tenant_xyz.
# Safe: tenant-scoped GSI query
resp = client.query(
TableName="Spotlight.Tours.Definitions",
IndexName="gsi-tenant-status",
KeyConditionExpression="tenant_status = :ts",
ExpressionAttributeValues={
":ts": {"S": f"{tenant_id}#published"},
},
)The same pattern applies to all GSIs:
| GSI | PK Format | Tenant Scope |
|---|---|---|
gsi-tenant-status | {tenant_id}#{status} | Isolated |
gsi-content-users | {tenant_id}#{content_id} | Isolated |
gsi-content | {tenant_id}#{content_id} | Isolated |
gsi-session | {tenant_id}#{session_id} | Isolated |
gsi-actor | {tenant_id}#{actor_id} | Isolated |
gsi-entity | {tenant_id}#{entity_type}#{entity_id} | Isolated |
Event Isolation
Domain events include tenant_id in the event payload. Event handlers use this to route data to the correct tenant partitions:
{
"event_type": "TourCompleted",
"tenant_id": "tenant_abc",
"data": {
"tour_id": "tour_456",
"user_id": "user_123"
}
}When the analytics handler processes this event, it writes to Events.Aggregates with a partition key of tenant_abc#tour_456 -- scoped to the originating tenant.
Outbox Table
Events.Outbox is the second deliberate exception to the tenant-partitioning invariant (the first being Platform.Memberships). Partitioning on tenant_id would force the delivery worker to know every active tenant's id ahead of time and round-robin queries; partitioning on event_id lets a single sparse gsi-status GSI (status HASH, timestamp RANGE) drain every tenant's pending events in chronological order.
- PK:
event_id - SK:
timestamp - GSI
gsi-status: sparse — only PENDING events are indexed, so the GSI stays small no matter how much history accrues.
Tenant isolation here is enforced at the application layer: every event payload carries its tenant_id, and downstream handlers route into tenant-scoped tables (which DO partition on tenant_id). The outbox itself sees the cross-tenant stream because it has to.
Infrastructure-Level Controls
DynamoDB Encryption
All tables have server-side encryption enabled:
server_side_encryption {
enabled = true # AWS-managed KMS key
}This protects data at rest. For tenants requiring customer-managed keys, a dedicated KMS key can be specified per table.
Point-in-Time Recovery
PITR is enabled in production to protect against accidental data loss:
point_in_time_recovery {
enabled = var.enable_pitr # true in production
}IAM Least Privilege
Lambda functions are granted access only to the specific DynamoDB tables they need:
Action = [
"dynamodb:GetItem", "dynamodb:PutItem",
"dynamodb:Query", "dynamodb:BatchGetItem",
"dynamodb:BatchWriteItem", "dynamodb:TransactWriteItems",
"dynamodb:DeleteItem"
]
Resource = concat(
values(var.table_arns),
[for arn in values(var.table_arns) : "${arn}/index/*"]
)No Lambda function has dynamodb:* permissions. Table-level access is explicitly enumerated.
Isolation Verification
What prevents cross-tenant access?
| Layer | Control | Mechanism |
|---|---|---|
| Authentication | API key resolves tenant_id | Key is cryptographically bound to one tenant |
| Application | AuthContext.tenant_id injected from API key | Cannot be overridden by user input |
| Repository | All queries use tenant_id in key | DynamoDB enforces partition boundary |
| DynamoDB | Partition key isolation | Physical data separation |
| IAM | Least-privilege Lambda roles | No wildcard permissions |
Threat scenarios
| Threat | Mitigation |
|---|---|
| Attacker sends another tenant's API key | Only the key holder's tenant is accessible |
Attacker modifies tenant_id in request | Not possible -- tenant_id comes from API key validation, not user input |
Attacker guesses another tenant's tour_id | The query uses {attacker_tenant_id}#{tour_id}, which is a different partition |
| Attacker performs a DynamoDB scan | The application never performs scans in production |
| Insider accesses DynamoDB directly | Audit trail in Audit.AdminActions, IAM logging, DynamoDB encryption |
| API key compromise | Revoke the key, issue a new one. The compromised key only accesses one tenant's data |
Table Naming Convention
All tables follow the naming pattern {prefix}.{feature}.{sub_table}:
def table_name(self, feature: str, sub_table: str) -> str:
return f"{self.dynamodb_table_prefix}.{feature}.{sub_table}"
# Examples:
# "Spotlight.Tours.Definitions"
# "Spotlight.Events.Aggregates"
# "Spotlight.Admin.Users"The prefix is configured via DYNAMODB_TABLE_PREFIX and varies by environment:
- Local:
Spotlight - Dev:
Spotlight.Dev - Production:
Spotlight.Prod
This prevents accidental cross-environment access while keeping the table structure consistent.