Auth & Data Flows
This page traces every request path from HTTP in to database and back — authentication, authorization, MCP tool calls, chat agent invocations, and app-to-platform communication. Use it to understand what the platform enforces, where your code sits in each flow, and what invariants you can rely on.
JWT Anatomy#
Every authenticated request carries an HS256-signed JWT. Two signing keys exist:
| Key | Used for | Config var |
|---|---|---|
| Control plane key | Human user login tokens | BOOTSTRAP_ADMIN_SECRET |
| App token key | Service account OAuth tokens | APP_TOKEN_SECRET (falls back to BOOTSTRAP_ADMIN_SECRET) |
Standard JWT payload:
{
"sub": "user-uuid or app_id",
"tenant_id": "tenant-uuid",
"roles": ["user"],
"groups": ["group-uuid-1"],
"scope": "",
"iat": 1716300000,
"exp": 1716303600
}Roles are one of: global_admin, tenant_admin, user, service, app_service_account.
Role hierarchy (what each role subsumes):
global_admin → everything
tenant_admin → tenant_admin, user
user → user
service → service (lateral — no user access)
app_service_account → app_service_account (lateral — no user access)
Flow 1 — Human User Login#
POST /api/v1/auth/login
Body: { "email": "...", "password": "..." }
Client
│
▼
POST /api/v1/auth/login
│
├─ auth/handlers/login.py → cp_login(email, password)
│ │
│ ▼
│ POST {CONTROL_PLANE_URL}/auth/login ← argon2 password verify
│ │
│ ▼
│ returns { token, refresh_token, sub, tenant_id, roles, expires_at }
│ │
├─ Decode JWT (no sig verify) → extract groups
│ If groups absent → query core_group_members by sub
│ │
├─ Cache token → auth_tokens (MongoDB, TTL = exp)
├─ Cache roles → auth_roles (MongoDB, keyed by sub + email)
├─ Stamp last_login_at → core_scim_users
├─ audit_emit("user.login")
│
▼
Response: { token, refresh_token, sub, tenant_id, user_name, roles, expires_at }
Collections touched: auth_tokens, auth_roles, core_group_members, core_scim_users
Security controls:
- Password hashing (argon2) is owned by the control plane — the backend never sees plaintext
- Token is cached locally so subsequent validation is pure JWT decode (no network call)
Flow 2 — App Service Account OAuth#
POST /api/v1/auth/token
Content-Type: application/x-www-form-urlencoded
Body: grant_type=client_credentials&client_id=<appId>&client_secret=<secret>
Client
│
▼
POST /api/v1/auth/token
│
├─ Validate grant_type == "client_credentials" → 400 otherwise
│
├─ verify_client_credentials(client_id, client_secret)
│ │
│ ▼
│ Query app_service_accounts WHERE app_id = client_id
│ bcrypt.checkpw(client_secret, stored_hash) → 401 on mismatch
│ │
│ ▼
│ returns { app_id, tenant_id, ... }
│
├─ Build JWT payload:
│ { sub: client_id, tenant_id, roles: ["app_service_account"],
│ groups: [], scope, iat, exp: now+3600 }
│
├─ Sign with APP_TOKEN_SECRET (fallback: BOOTSTRAP_ADMIN_SECRET)
├─ Cache in auth_tokens
├─ audit_emit("auth.token_issued")
│
▼
Response: { access_token, token_type: "Bearer", expires_in: 3600, scope }
Token lifetime: 1 hour, no refresh token issued. Re-exchange client credentials to get a new token.
Secret storage: rotate_credentials generates secrets.token_urlsafe(32), bcrypt-hashes it, stores only the hash. Plaintext is returned once.
Production requirement: set APP_TOKEN_SECRET to a dedicated random secret (openssl rand -hex 32). Without it, the backend falls back to BOOTSTRAP_ADMIN_SECRET — rotating your admin password then silently invalidates all service account tokens.
Flow 3 — Token Validation (every authenticated request)#
Every protected endpoint depends on get_current_user:
Incoming request: Authorization: Bearer <token>
│
├─ HTTPBearer extracts token → 401 if absent
│
├─ has_sdk()? → 401 "Sidecar not available" if false
│
├─ SidecarClient.authorize(token)
│ │
│ ▼
│ pyjwt.decode(token, BOOTSTRAP_ADMIN_SECRET or APP_TOKEN_SECRET, HS256)
│ → 401 on signature failure or expiry
│ │
│ ▼
│ Claims { sub, tenant_id, roles, groups, exp }
│
▼
Handler receives Claims object
No database lookup on the hot path. Validation is pure in-process JWT decode.
Three additional guards (applied per-endpoint):
| Guard | What it checks |
|---|---|
require_roles("tenant_admin") | Caller's roles (with hierarchy expansion) must include all specified roles |
require_groups("group-id") | Caller's JWT groups claim must include the specified group |
require_policy(action, resource) | gRPC sidecar evaluates Rego policy data.cyberpod.rbac.allow |
Fail mode: CORESDK_FAIL_MODE controls what happens when the gRPC sidecar is unreachable during require_policy evaluation. Default is "open" (allow). Set to "closed" in production if policy enforcement must be strict.
Flow 4 — App Registration#
POST /api/v1/apps/register
Authorization: Bearer <any-valid-token>
Body: { "name": "my-app", "description": "..." }
Client (any authenticated user or service account)
│
▼
POST /api/v1/apps/register
│
├─ get_current_user → Claims
│
├─ create_app({
│ name, description,
│ tenant_id: claims.tenant_id, ← always from token, never from body
│ registered_by: claims.sub
│ })
│ │
│ ▼
│ Generate app_id (UUID), status: "active"
│ Insert into apps collection (MongoDB)
│
├─ audit_emit("app.registered")
│
▼
Response: {
appId, name, description,
tenantId, registeredBy, status, createdAt
}
Tenant isolation: tenant_id is always taken from the Bearer token — the request body cannot specify it.
Flow 5 — Credentials Rotation#
POST /api/v1/apps/{appId}/credentials/rotate
Authorization: Bearer <token where sub == appId OR role is tenant_admin/global_admin>
├─ _assert_app_access(app_id, claims)
│ → 403 unless claims.sub == app_id OR caller is admin
│
├─ Look up app in apps collection → resolve tenant_id
│
├─ Generate plaintext secret: secrets.token_urlsafe(32)
├─ bcrypt.hashpw(secret, bcrypt.gensalt())
├─ Upsert into app_service_accounts:
│ { app_id, tenant_id, client_secret_hash, updated_at }
│
├─ audit_emit("app.credentials_rotate")
│
▼
Response: { clientId: appId, clientSecret: "<plaintext — shown once only>" }
Store the returned clientSecret immediately in a secrets manager. It is never retrievable again.
Flow 6 — MCP Tool Registration#
POST /api/v1/apps/{appId}/mcp/tools
Authorization: Bearer <token where sub == appId OR admin>
Body: { "tools": [{ name, description, endpoint, method, inputSchema, tags, maskResponse, secretHeaders }] }
├─ _assert_app_access(app_id, claims)
│
├─ Validate each tool: name, endpoint, method required
│
├─ For each tool, upsert into mcpresttools:
│ {
│ tenantId: claims.tenant_id,
│ appId: app_id,
│ name, description, endpoint, method,
│ inputSchema, tags,
│ mask_response, ← stored snake_case, returned as maskResponse
│ secretHeaders,
│ updatedAt
│ }
│ Unique index: (tenantId, appId, name)
│
▼
Response: [{ id, name, description, endpoint, method, inputSchema, ... }]
Upsert semantics: re-registering the same tool name updates it in place. Safe to call on every startup.
Flow 7 — MCP Proxy Call (external agent)#
This is the path for external callers: AI agents, scripts, other apps.
POST /api/v1/mcp/call
Authorization: Bearer <token>
Body: { "tool": "get_person", "arguments": { "id": "per-001" }, "appId": "app_01..." }
External caller (Claude, LangChain, curl, another app)
│
▼
POST /api/v1/mcp/call
│
├─ get_current_user → Claims [1] JWT validate
│
├─ assert_rate_limit("mcp_call:{sub}", 120/60s) [2] rate limit per caller sub
│
├─ Extract raw Bearer token from Authorization header
│ (this token is forwarded to the upstream endpoint)
│
├─ find_tool(tenant_id, tool_name, app_id) [3] tenant-scoped lookup
│ → 404 if not found
│
├─ Path param substitution: [4] resolve {id} → "per-001"
│ endpoint: /api/v1/people/{id}
│ → path: /api/v1/people/per-001
│ remaining args → query params or body
│
├─ Assemble URL: SELF_BASE_URL + path [5] URL assembly
│
├─ is_safe_url(url) [6] SSRF guard (DNS resolve)
│ → 403 if resolves to RFC-1918 / loopback
│
├─ resolve_secrets_for_claims(claims, [7] secret header injection
│ tool.secretHeaders.values())
│ → gRPC sidecar → plaintext values
│ → injected as HTTP headers on upstream request
│
├─ httpx.AsyncClient.request(method, url, [8] upstream HTTP call
│ headers={
│ Authorization: Bearer <forwarded token>,
│ Content-Type: application/json,
│ ...secret headers
│ })
│ → 502 on upstream 4xx/5xx
│
├─ _maybe_mask(response_text, tool) [9] PII masking
│ if tool.mask_response → sidecar gRPC mask_string_rpc
│
├─ audit_emit("mcp.tool_call", outcome) [10] audit trail
│
▼
Response: { "content": [{ "type": "text", "text": "<json>" }], "tool": "get_person" }
The forwarded token: the caller's own JWT is passed as Authorization: Bearer to the upstream endpoint. The upstream app validates it against the same sidecar — so the tool call runs with the caller's identity and tenant scope.
Secret headers: secretHeaders: { "X-Api-Key": "my-bundle" } — the bundle name is resolved by the sidecar at call time. The plaintext value is injected as an HTTP header into the upstream call but never logged or returned to the caller.
Flow 8 — Chat Agent → MCP Tool (internal path)#
This is the path when a user's chat session triggers a tool call. It does not go through POST /mcp/call.
User sends chat message
│
▼
POST /api/v1/chat/completions
│
├─ get_current_user → Claims
├─ Rate limit: 60 chat turns/min per user
├─ get_or_create chat_session (scoped to tenant_id)
├─ PipelineContext built from claims (tenant_id, user_id, request_id)
│
├─ OrchestratorFactory.select(chat_type) → orchestrator
│
├─ orchestrator.run_streaming / run_non_streaming
│ │
│ ▼
│ ChatAgent.ReAct loop
│ │
│ ├─ resolve_tools(agent_config, tenant_id, user_id)
│ │ │
│ │ ├─ BUILTIN_TOOLS (memory_recall, generate_artifact, ...)
│ │ └─ find_callable_servers(tenant_id, user_id)
│ │ scope=platform → visible to all tenants
│ │ scope=tenant → this tenant only
│ │ scope=user → this user only
│ │
│ ├─ LLM issues tool call → dispatch to ToolSpec.fn
│ │
│ └─ McpInvoker.call(server_doc, tool_name, args, ctx)
│ │
│ ├─ _prepare_server_doc: resolve ${BUNDLE} in headers → sidecar
│ │
│ ├─ _merge_metadata: inject tenant_id, user_id, project_ids
│ │ into args — OVERWRITES any LLM-supplied identity values
│ │ (security boundary: LLM cannot override caller identity)
│ │
│ ├─ Transport selection:
│ │ mTLS → fresh transport + X-Tenant-ID, X-User-UUID headers
│ │ secrets → fresh transport
│ │ normal → cached client by (serverUrl, transport)
│ │
│ ├─ client.call_tool(tool_name, merged_args) ← FastMCP protocol
│ │
│ ├─ audit_emit("mcp.tool_call", source: "chat")
│ │
│ └─ _result_to_envelope → { content, isError, metadata }
│
▼
SSE stream: tool result → next LLM turn → ... → final message
Key difference from /mcp/call: the chat path uses FastMCP protocol directly (not HTTP REST). Identity is injected into tool arguments via _merge_metadata rather than forwarded as a Bearer token. The LLM cannot override tenant_id, user_id, or other identity fields — the sidecar enforces them server-side.
Flow 9 — HITL (Human-in-the-Loop) Approval#
When a chat agent selects a tool marked requires_approval=True:
Agent selects tool requiring approval
│
├─ hitl_manager.create_pending({
│ approvalId (UUID), chatId, turnId, tenantId, userId,
│ toolName, toolArgs, expiresAt (now + HITL_TIMEOUT_SECONDS)
│ }) → written to hitl_pending collection
│
├─ HITLRequiredEvent → SSE queue → client UI shows approval dialog
│
├─ Agent enters polling loop (every 2s, max 2× timeout)
│
│ ┌──────────────────────────────┐
│ │ Human reviews in UI │
│ │ POST /api/v1/chat/hitl/ │
│ │ {approval_id}/resolve │
│ │ Body: { status: "approved" │
│ │ or "rejected" } │
│ │ │
│ │ Auth: get_current_user │
│ │ Scoped to claims.tenant_id │
│ │ Updates hitl_pending doc │
│ └──────────────────────────────┘
│
├─ Polling loop detects resolved status
│
├─ "approved" → continue ReAct loop, call tool
│ "rejected" → return rejection message to LLM
│ expired → TTL index removes doc, polling returns timeout error
│
▼
Agent continues
Tenant isolation: every hitl_pending query includes tenantId filter. A user from tenant A cannot resolve approvals belonging to tenant B.
Flow 10 — App → cPod EDM (service account calling the platform)#
A scaffolded app calling client.people.list():
Scaffolded app
│
├─ CpodClient.fromEnv()
│ reads CPOD_API_KEY (Bearer token — your service account JWT)
│ reads CPOD_API_URL (defaults to https://api.cyberpod.app)
│
├─ GET {CPOD_API_URL}/api/v1/people
│ Authorization: Bearer <service account JWT>
│
▼
cpod-backend /api/v1/people
│
├─ get_current_user → decode JWT → Claims
│ sub = your appId
│ tenant_id = your tenant (from token, not from request)
│ roles = ["app_service_account"]
│
├─ Query scoped to claims.tenant_id
│ → returns only your tenant's people
│
▼
Response: { items: [...], total: N }
Tenant isolation is automatic. The JWT carries tenant_id; every EDM query filters by it. Your app cannot access another tenant's data even if it guesses their IDs.
Service account limits: app_service_account role does not inherit user rights. Endpoints that require user or tenant_admin via require_roles will return 403.
Authorization Matrix#
| Endpoint | Minimum role required |
|---|---|
POST /api/v1/auth/login | None (public) |
POST /api/v1/auth/token | None (public, credentials in body) |
POST /api/v1/auth/register | None (public) |
POST /api/v1/auth/invite | tenant_admin |
POST /api/v1/apps/register | Any valid token |
POST /api/v1/apps/{id}/credentials/rotate | App owner (sub==appId) or admin |
POST /api/v1/apps/{id}/mcp/tools | App owner or admin |
GET /api/v1/apps/{id}/mcp/tools | App owner or admin |
DELETE /api/v1/apps/{id}/mcp/tools/{name} | App owner or admin |
GET /api/v1/apps/{id}/mcp/proxy | App owner or admin |
POST /api/v1/mcp/call | Any valid token |
POST /api/v1/chat/completions | Any valid token |
POST /api/v1/chat/hitl/{id}/resolve | Any valid token (scoped to tenant) |
GET /api/v1/people, etc. (EDM) | Any valid token (scoped to tenant) |
Admin routes (/admin/*) | global_admin |
Tenant Isolation — Where It's Enforced#
Tenant isolation is enforced at the repository layer, not in middleware. Every collection query that could return cross-tenant data includes tenantId as a filter field:
| Collection | Isolation field | Where enforced |
|---|---|---|
mcpresttools | tenantId | repos_rest.py — all queries |
mcptools | tenantId | repos.py — find_callable_servers |
chat_sessions | tenantId | session_manager |
hitl_pending | tenantId | hitl_manager |
apps | tenant_id | admin/repos.py |
app_service_accounts | tenant_id | credentials.py |
What is NOT tenant-scoped: scope=platform MCP servers (tools registered as platform-wide, e.g., ARAI). These are intentionally visible to all tenants.
Security Controls Summary#
| Control | Mechanism | Where |
|---|---|---|
| Password hashing | argon2 (control plane) | Login flow |
| Secret hashing | bcrypt | credentials.py |
| JWT signing | HS256 | client_credentials.py |
| JWT verification | pyjwt.decode in-process | sdk.py |
| Rate limiting | gRPC sidecar (120 req/min MCP, 60 turns/min chat) | ratelimit.py |
| SSRF protection | DNS resolve + RFC-1918 blocklist | url_guard.py |
| PII masking | gRPC sidecar mask_string_rpc | rest_tools.py, mcp_invoker.py |
| Secret injection | gRPC sidecar ResolveSecret | secrets_inject.py |
| Identity injection | _merge_metadata overwrites LLM args | mcp_invoker.py |
| Tenant scoping | tenantId filter on every Mongo query | Repository layer |
| Policy evaluation | OPA/Rego via gRPC sidecar | require_policy guard |
| Audit trail | audit_emit on all mutating ops | shared/audit.py |
Known Operational Notes#
CORESDK_FAIL_MODE=open (default): if the gRPC sidecar is unavailable, require_policy calls fail-open (allow). Change to closed for strict production enforcement.
APP_TOKEN_SECRET: must be set independently of BOOTSTRAP_ADMIN_SECRET in production. Generate with openssl rand -hex 32. Without it, admin password rotation invalidates all service tokens.
DNS rebinding: the SSRF guard resolves DNS at check time; httpx resolves again at connect time. There is a TOCTOU window. Mitigate by running the backend in a network namespace with no internal DNS reachable from SELF_BASE_URL.
SELF_BASE_URL: if misconfigured to an internal host, tool endpoints assembled from it bypass the SSRF guard (the guard runs on the assembled URL). Always point this at the public-facing hostname or loopback-only ingress.
No rate limit on /auth/token: the client_credentials endpoint has no per-IP or per-client_id throttle. bcrypt is inherently slow (~100ms) but this is not a substitute for rate limiting under a distributed brute-force scenario.