Skip to content

Multi-Tenant Data Isolation

Spotlight is a multi-tenant platform where each tenant's data must be completely isolated from all other tenants. This is achieved through DynamoDB partition key isolation, tenant-scoped queries at the application layer, and infrastructure-level controls.

Isolation Model

Every DynamoDB table in the system uses the tenant_id as part of its partition key structure. This ensures that DynamoDB's internal data distribution physically separates tenant data across different partitions:

+-----------------------------------+
|  Spotlight.Tours.Definitions      |
|                                   |
|  Partition: tenant_abc            |
|    tour_001, tour_002, tour_003   |
|                                   |
|  Partition: tenant_xyz            |
|    tour_100, tour_101             |
|                                   |
|  (separate partitions, no overlap)|
+-----------------------------------+

Partition Key Design

Direct Tenant Partitioning

Tables where tenant_id is the partition key:

TablePKSK
Tenants.Configtenant_id--
Tours.Definitionstenant_idtour_id
Content.Definitionstenant_idcontent_id
Content.Checkliststenant_idchecklist_id
Audiences.Rulestenant_idaudience_id
Themes.Definitionstenant_idtheme_id
Surveys.Definitionstenant_idsurvey_id
Admin.Userstenant_iduser_id
Audit.AdminActionstenant_idtimestamp_event_id
Tenants.ApiKeystenant_idapi_key_prefix

With tenant_id as the partition key, a DynamoDB Query operation on one tenant cannot return results from another tenant. This is enforced by DynamoDB at the storage layer.

Composite Key Partitioning

Tables that embed tenant_id in a composite partition key:

TablePK FormatSK
Tours.Versionstenant_abc#tour_456version (N)
Progress.UserStatetenant_abc#user_123content_id
Events.Interactionstenant_abc#tour_456timestamp#event_id
Events.Aggregatestenant_abc#tour_456date#metric
Surveys.Responsestenant_abc#survey_789user_timestamp
Activity.UserEventstenant_abc#user_123timestamp_event_id

The composite key format {tenant_id}#{entity_id} ensures that queries are always scoped to a single tenant. Even if an attacker discovers another tenant's entity_id, they cannot access it because the partition key would resolve to {their_tenant_id}#{entity_id} -- a different partition entirely.

API Keys

The Tenants.ApiKeys table is partitioned on tenant_id with api_key_prefix as the sort key. The full plaintext key format is sk_<env>_<32-hex-tenant-uuid>_<random>, so the validation path extracts the tenant UUID from the key itself and issues a direct GetItem against (tenant_id, api_key_prefix). No cross-tenant scan is ever required, and a bearer presenting a key can only ever touch their own partition.

Cross-tenant exception: Platform.Memberships

There is exactly one table where a row's partition key is not tenant_id: Spotlight.Platform.Memberships. It maps a Clerk platform user (platform_user_id) to the tenants they can switch into, and is read before a tenant is chosen — so partitioning by tenant_id would make the "which tenants am I on?" lookup impossible.

  • PK: platform_user_id
  • SK: tenant_id
  • Attributes: role, created_at

The trust model is:

  1. Only /v1/platform/tenants reads this table, and only with a validated platform JWT. The caller's sub claim is pinned as the partition key — a user can only ever list their own memberships.
  2. Once a tenant is picked, the regular AuthContext.tenant_id takes over and every downstream repo call is tenant-scoped as usual.
  3. Row creation happens only through super-admin tenant-create, invite-accept, or the seed script. There is no user-facing endpoint that can mint a membership outside those flows.

This table intentionally sits outside the tenant-isolation invariant because it's the pivot that makes multi-tenant admin possible in the first place. Treat any new code that reads it with the same care as authentication itself.

Application-Layer Enforcement

Tenant Context Injection

After authentication, the tenant ID is injected into every route handler via the AuthContext dependency:

python
@router.get("/{tour_id}")
async def get_tour(
    tour_id: str,
    auth: AuthContext = Depends(require_admin),
    tour_repo: TourRepository = Depends(get_tour_repo),
):
    # auth.tenant_id is set by the authentication middleware
    # The repository uses it to scope the query
    tour = await tour_repo.get(auth.tenant_id, tour_id)

The tenant_id comes from the authenticated API key -- never from user input. There is no X-Tenant-Id header or query parameter that allows callers to specify a different tenant.

Repository Pattern

All repository methods require tenant_id as the first parameter and construct DynamoDB keys using it:

python
class DynamoDBContentRepository:
    async def get(self, tenant_id: str, content_id: str) -> Content | None:
        resp = await asyncio.to_thread(
            self._client.get_item,
            TableName=self._table,
            Key={
                "tenant_id": {"S": tenant_id},
                "content_id": {"S": content_id},
            },
        )
        item = resp.get("Item")
        # ...

    async def list_by_tenant(self, tenant_id: str, limit: int = 20) -> list[Content]:
        resp = await asyncio.to_thread(
            self._client.query,
            TableName=self._table,
            KeyConditionExpression="tenant_id = :tid",
            ExpressionAttributeValues={
                ":tid": {"S": tenant_id},
            },
            Limit=limit,
        )
        # ...

This design makes cross-tenant data access structurally impossible -- there is no code path that queries DynamoDB without a tenant-scoped key.

No Cross-Tenant Queries

The system never performs table scans or queries that span multiple tenants. Every DynamoDB operation is one of:

  • GetItem with a key containing tenant_id.
  • Query with tenant_id as the partition key expression.
  • PutItem / UpdateItem / DeleteItem with a key containing tenant_id.
  • TransactWriteItems where all items include tenant_id.

No table scans

DynamoDB Scan operations are prohibited in production code. A scan would read all items across all tenants, violating data isolation. The only exception is the local development seed script.

Global Secondary Index Isolation

GSIs follow the same tenant-scoped pattern:

hcl
# Tours.Definitions: query by status within a tenant
global_secondary_index {
  name     = "gsi-tenant-status"
  hash_key = "tenant_status"     # "tenant_abc#published"
  range_key = "updated_at"
}

The GSI partition key is a composite of tenant_id and status, so querying tenant_abc#published cannot return results from tenant_xyz.

python
# Safe: tenant-scoped GSI query
resp = client.query(
    TableName="Spotlight.Tours.Definitions",
    IndexName="gsi-tenant-status",
    KeyConditionExpression="tenant_status = :ts",
    ExpressionAttributeValues={
        ":ts": {"S": f"{tenant_id}#published"},
    },
)

The same pattern applies to all GSIs:

GSIPK FormatTenant Scope
gsi-tenant-status{tenant_id}#{status}Isolated
gsi-content-users{tenant_id}#{content_id}Isolated
gsi-content{tenant_id}#{content_id}Isolated
gsi-session{tenant_id}#{session_id}Isolated
gsi-actor{tenant_id}#{actor_id}Isolated
gsi-entity{tenant_id}#{entity_type}#{entity_id}Isolated

Event Isolation

Domain events include tenant_id in the event payload. Event handlers use this to route data to the correct tenant partitions:

json
{
  "event_type": "TourCompleted",
  "tenant_id": "tenant_abc",
  "data": {
    "tour_id": "tour_456",
    "user_id": "user_123"
  }
}

When the analytics handler processes this event, it writes to Events.Aggregates with a partition key of tenant_abc#tour_456 -- scoped to the originating tenant.

Outbox Table

Events.Outbox is the second deliberate exception to the tenant-partitioning invariant (the first being Platform.Memberships). Partitioning on tenant_id would force the delivery worker to know every active tenant's id ahead of time and round-robin queries; partitioning on event_id lets a single sparse gsi-status GSI (status HASH, timestamp RANGE) drain every tenant's pending events in chronological order.

  • PK: event_id
  • SK: timestamp
  • GSI gsi-status: sparse — only PENDING events are indexed, so the GSI stays small no matter how much history accrues.

Tenant isolation here is enforced at the application layer: every event payload carries its tenant_id, and downstream handlers route into tenant-scoped tables (which DO partition on tenant_id). The outbox itself sees the cross-tenant stream because it has to.

Infrastructure-Level Controls

DynamoDB Encryption

All tables have server-side encryption enabled:

hcl
server_side_encryption {
  enabled = true  # AWS-managed KMS key
}

This protects data at rest. For tenants requiring customer-managed keys, a dedicated KMS key can be specified per table.

Point-in-Time Recovery

PITR is enabled in production to protect against accidental data loss:

hcl
point_in_time_recovery {
  enabled = var.enable_pitr  # true in production
}

IAM Least Privilege

Lambda functions are granted access only to the specific DynamoDB tables they need:

hcl
Action = [
  "dynamodb:GetItem", "dynamodb:PutItem",
  "dynamodb:Query", "dynamodb:BatchGetItem",
  "dynamodb:BatchWriteItem", "dynamodb:TransactWriteItems",
  "dynamodb:DeleteItem"
]
Resource = concat(
  values(var.table_arns),
  [for arn in values(var.table_arns) : "${arn}/index/*"]
)

No Lambda function has dynamodb:* permissions. Table-level access is explicitly enumerated.

Isolation Verification

What prevents cross-tenant access?

LayerControlMechanism
AuthenticationAPI key resolves tenant_idKey is cryptographically bound to one tenant
ApplicationAuthContext.tenant_id injected from API keyCannot be overridden by user input
RepositoryAll queries use tenant_id in keyDynamoDB enforces partition boundary
DynamoDBPartition key isolationPhysical data separation
IAMLeast-privilege Lambda rolesNo wildcard permissions

Threat scenarios

ThreatMitigation
Attacker sends another tenant's API keyOnly the key holder's tenant is accessible
Attacker modifies tenant_id in requestNot possible -- tenant_id comes from API key validation, not user input
Attacker guesses another tenant's tour_idThe query uses {attacker_tenant_id}#{tour_id}, which is a different partition
Attacker performs a DynamoDB scanThe application never performs scans in production
Insider accesses DynamoDB directlyAudit trail in Audit.AdminActions, IAM logging, DynamoDB encryption
API key compromiseRevoke the key, issue a new one. The compromised key only accesses one tenant's data

Table Naming Convention

All tables follow the naming pattern {prefix}.{feature}.{sub_table}:

python
def table_name(self, feature: str, sub_table: str) -> str:
    return f"{self.dynamodb_table_prefix}.{feature}.{sub_table}"

# Examples:
# "Spotlight.Tours.Definitions"
# "Spotlight.Events.Aggregates"
# "Spotlight.Admin.Users"

The prefix is configured via DYNAMODB_TABLE_PREFIX and varies by environment:

  • Local: Spotlight
  • Dev: Spotlight.Dev
  • Production: Spotlight.Prod

This prevents accidental cross-environment access while keeping the table structure consistent.

Spotlight