architecture · doc v1

Browser extension · architecture & backend reference

How the Kirtonic browser extension enforces governance at the keystroke, what it talks to, and the design choices behind every step.

extension v0.2.015 sectionsprotocol diagramsthreat modelapi spec

section 01

Overview

The Kirtonic browser extension is a Manifest V3 Chromium extension that intercepts every prompt a user is about to submit to a public AI chat site (chatgpt.com, chat.openai.com, claude.ai, gemini.google.com, copilot.microsoft.com), routes it through the same governance engine that the platform's server-side API uses, and either allows, warns, or blocks the submission based on the workspace policy.

The extension addresses the shadow-AI channel: traffic between workstation browsers and public AI chat domains. This traffic is encrypted in transit, terminates at third-party domains, and does not traverse corporate egress proxies as application-layer content. Conventional DLP and CASB controls cannot inspect prompt bodies; the extension performs the inspection client-side, before the host page issues its submit request.

Architecturally the extension is an additional API consumer of the existing governance backbone. It authenticates against POST /api/v1/extension/verdict with a Bearer token and writes through the shared governance_signals → governance_decisions → governance_audit pipeline used by the programmatic v1 API, the playground, and the ad reviewer. No extension-specific tables, queues, or audit surfaces are introduced.

section 02

Capabilities

The extension exposes the following system properties when deployed against a workspace:

Pre-egress classification. Each prompt is intercepted before the host page issues its network request to the AI provider. The classifier call completes against the workspace policy + reviewer correction context and returns a verdict (allow, warn, block) before the original submit is allowed to resume.
Per-tenant policy evaluation. Classification uses governance_policy_rules + governance_audit.event_type = 'severity_overridden' from the calling workspace only. Two workspaces submitting the same prompt may receive different verdicts.
Audit-log persistence. Every verdict call inserts one row into governance_signals, one row into governance_decisions, and the corresponding signal_received + decision_created / auto_approved rows into governance_audit. Outbound webhooks fire asynchronously.
Block enforcement at the keystroke. When the verdict is blockand the workspace is in block-mode, the content script's capture-phase listener calls stopImmediatePropagation() + preventDefault() and does not re-fire the submit. The provider request is never formed.
Override audit trail. User overrides on a block dialog write an audit entry against the original decision_id and add the prompt to a per-tab session whitelist so subsequent identical submissions short-circuit the classifier call.
Reviewer feedback loop. Severity reclassifications and approvals in the dashboard write severity_overridden audit rows. The next verdict call from the same workspace loads these via loadRecentReclassifications() and the deterministic correction override (Jaccard ≥ 0.6) forces matching prompts into the reviewer-corrected severity band without retraining the classifier.
Latency envelope. End-to-end verdict round-trip is dominated by the classifier call. Database and bookkeeping queries are comparatively small. Measured numbers will vary with the classifier configured for the workspace, the deployment topology, and the load on the AI provider.

section 03

Control gaps in the unmonitored channel

Public AI chat sites, chatgpt.com, claude.ai, gemini.google.com, copilot.microsoft.com, are accessed over HTTPS from a workstation browser to a third-party domain. None of the traffic crosses a corporate egress proxy as application-layer content; classical DLP, CASB, and email-gateway tooling cannot inspect prompt bodies. Without an in-browser intercept the following classes of control are absent by construction:

Sensitive-data egress detection. No system records the content of prompts submitted to AI provider domains. The set of PII, credentials, regulated advice, and confidential material sent to third-party models is unknown to the organisation.
Audit-log coverage for AI usage. Frameworks such as UK FCA SYSC 4, SOC 2 CC7, ISO 27001 A.8.16, and NIS2 require an evidence trail for controls over information processing. Absent an audit log per-prompt, the organisation cannot answer who submitted what, when, and against which policy for the AI channel.
Intellectual-property containment.Source code and contract language pasted into public models become part of the provider's request context. Per-user attribution of which artefact left the perimeter is unrecoverable without an interception layer.
Insider-risk detection in the chat surface. Outbound mail DLP and endpoint DLP do not see chat-tab content. Bulk exfiltration via paste-to- chat is invisible to existing data-loss controls.
Prompt-injection detection. Webpage content rendered in an AI chat session can carry injection payloads. Without a classifier evaluating the prompt body, malicious patterns are not detected before the user submits the action.
Policy refinement signal. Static written policy lacks the feedback loop required to identify and codify high-frequency violation patterns. Without intercepted traffic as input, policy authors operate on assumption rather than observation.

section 04

System architecture

The extension is one of three API consumers that ride the same governance backbone. The others are the programmatic /api/v1/*surface (used by customers' production code) and the in-platform playground / dashboard.

     ┌──────────────────────────────────────────────────────────┐
     │                       Browser tab                        │
     │  ┌────────────────┐  ┌──────────────────────────────────┐│
     │  │  AI chat site  │◄─┤  Content script (isolated world) ││
     │  │  (chatgpt.com, │  │   • intercept keydown/click      ││
     │  │   claude.ai,   │  │   • read prompt from DOM         ││
     │  │   gemini, …)   │  │   • paint verdict pill + dialog  ││
     │  └────────────────┘  └─────────────┬────────────────────┘│
     │                                    │ chrome.runtime      │
     │                            ┌───────▼────────┐            │
     │                            │ Service worker │            │
     │                            │ (background.js)│            │
     │                            │  • holds token │            │
     │                            │  • CORS fetch  │            │
     │                            └───────┬────────┘            │
     └────────────────────────────────────┼─────────────────────┘
                                          │  HTTPS, Bearer
                                          ▼
          ┌────────────────────────────────────────────────────┐
          │      POST /api/v1/extension/verdict (Next.js)      │
          │  ┌──────────────────────────────────────────────┐  │
          │  │ requireApiAuth, validates Bearer token,     │  │
          │  │ checks scopes, returns service-role context  │  │
          │  └──────────────────────────────────────────────┘  │
          │  ┌──────────────────────────────────────────────┐  │
          │  │ Kirtonic hosted classifier:                  │  │
          │  │  prompt + workspace policy rules +           │  │
          │  │  recent reviewer corrections                 │  │
          │  └──────────────────────────────────────────────┘  │
          │  ┌──────────────────────────────────────────────┐  │
          │  │ Deterministic correction override            │  │
          │  │  (Jaccard match against past reclasses)      │  │
          │  └──────────────────────────────────────────────┘  │
          │  ┌──────────────────────────────────────────────┐  │
          │  │ ingestSignal()                               │  │
          │  │  → governance_signals                        │  │
          │  │  → governance_decisions (auto severity)      │  │
          │  │  → governance_audit (signal_received + …)    │  │
          │  │  → deliverWebhooks() (async fire-and-forget) │  │
          │  └──────────────────────────────────────────────┘  │
          │                       │                            │
          │   JSON: { verdict, severity, reason,               │
          │           category, decision_id, audit_url }       │
          └────────────────────────────────────────────────────┘
                                  │
                                  ▼
          ┌────────────────────────────────────────────────────┐
          │   Supabase Postgres (RLS by workspace_member)      │
          │     workspaces · workspace_members                 │
          │     api_keys (extension:verdict scope)             │
          │     governance_signals · governance_decisions      │
          │     governance_audit · governance_policy_rules     │
          └────────────────────────────────────────────────────┘

The boundary that matters: everything to the right of the HTTPS line is shared with the rest of the platform.The extension does not have a private database, a private classifier, or a private audit log. The dashboard's decision queue shows extension events alongside playground events alongside production v1 API events, ordered by time. This is intentional, there is only one record of what this workspace did with AI today, regardless of channel.

section 05

The full request lifecycle

From the moment a user presses Enter to the moment the prompt either reaches Claude or doesn't, here is every step:

Keystroke intercept. The content script ( extension/content.js) attaches a capture-phase keydown listener on document at document_start, before any of the site's React handlers register. On Enter without Shift / IME composition, the listener calls e.stopImmediatePropagation() + e.preventDefault(), then reads the prompt text from the relevant editor element (#prompt-textarea for ChatGPT, div.ProseMirror[contenteditable="true"] for Claude, etc.).
Session allow-list check. If the user has previously clicked Send anyway on this exact prompt this browser session, the verdict round-trip is skipped entirely. The pill flashes "Allowed (your override)"and the keystroke is re-fired to the page so it submits normally. This both saves classifier cost and honours the user's explicit decision without re-asking.
Verdict dispatch. The content script postMessages a KIRTONIC_VERDICT payload to the background service worker. Content scripts cannot make cross-origin requests reliably and cannot hold the Bearer token securely (the page can read its own globals), so all network I/O lives in the service worker.
Token + endpoint resolution. The service worker reads apiToken and apiBase from chrome.storage.sync(encrypted by Chrome and scoped to the extension's origin). It POSTs to {apiBase}/api/v1/extension/verdict with body { message, site, url, mode } and header Authorization: Bearer {token}.
CORS preflight. The verdict endpoint exports an OPTIONS handler that returns Access-Control-Allow-Origin: *plus the methods/headers list, so the browser's preflight succeeds. All POST responses (success and error) also carry the same CORS headers.
Bearer validation. requireApiAuth(req, ['extension:verdict']) SHA-256s the token, looks it up in api_keys.hash via the service role client, rejects on revoked/expired/scope-missing, then bumps last_used_at best-effort. The returned context has workspaceId, sentinel userId (00000000-0000-0000-0000-000000000000), and a service-role Supabase client. The service-role bypass is critical: the extension has no Supabase session cookie, so RLS policies that depend on auth.uid() would reject every insert. The scope+workspace check is the authorisation; RLS is the wrong layer for it.
Classifier round-trip. Three context loads run in parallel: enabled policy rules for the surface (loadEnabledPolicyRules), recent reviewer reclassifications (loadRecentReclassifications), and the message itself. These are fed to classifyMessage(), which calls the Anthropic Messages API with a strict JSON-only system prompt. Returns { risk_score, confidence, reason, category }.
Deterministic correction override.Before trusting the classifier's numeric output, findCorrectionMatch() Jaccard-similarity-matches the prompt against past corrections from the same workspace (threshold 0.6 or exact normalised match). If hit, applyCorrectionOverride() forces the risk_scoreinto the band corresponding to the reviewer's corrected severity. This guarantees the loop closes even if the LLM ignored the in-prompt instruction to honour past overrides.
Signal + decision + audit insert. The shared ingestSignal() helper writes one row to governance_signals (with source: "extension/{site}" and a metadata bag including the prompt preview), one row to governance_decisions (severity derived from evaluateSignal against the workspace rules), and two rows to governance_audit (signal_received and either decision_created or auto_approved). Webhooks fire async via deliverWebhooks() without blocking the response.
Verdict response. Severity → verdict mapping: high → block, awaiting_approval → warn, else allow. The JSON response carries verdict, severity, reason, category, the decision_id, and a deep-link audit_url.
Pill + dialog UX. Content script paints a bottom-right pill colour-coded by verdict. For block in block mode, a confirm dialog opens with the classifier reason and a two-line "Risk if you proceed" warning; the user can Cancel or Send Anyway (Override). For warn, same dialog with softer copy. For allow, pill auto-dismisses after 3.5s.
Re-fire (on approval). If the verdict allows or the user overrides, the content script calls markApproved() (sets a 800ms approval window), then dispatches a synthetic KeyboardEvent('keydown', { key: 'Enter' }) at the original target. The keydown listener checks isApproved() first and returns without calling preventDefault, so the event flows to the site's React handler untouched, triggering its normal submit path.
Override audit (on block override). If the user clicked Send Anyway on a blocked prompt, the prompt text is added to sessionAllowed for the rest of the browser session. The original audit row already records the block decision; the override itself is implicit in the subsequent identical-prompt passthrough.

section 06

API surface

The extension touches exactly three HTTP endpoints. Two are unique to the extension experience; one is the same key-management endpoint the dashboard uses.

POST /api/v1/extension/verdict

The only endpoint called per-prompt. Bearer-authenticated with the extension:verdict scope. CORS-enabled for any origin so the background service worker (whose origin is chrome-extension://[id]/) can call it.

POST /api/v1/extension/verdict HTTP/1.1
Host: app.kirtonic.io
Content-Type: application/json
Authorization: Bearer cw_live_AbCdEf123…

{
  "message": "the prompt text the user typed",
  "site":    "claude.ai",
  "url":     "https://claude.ai/chat/abc-123",
  "mode":    "user_input"   // or "model_output" for response-side scoring
}

200 OK
{
  "verdict":     "allow" | "warn" | "block",
  "severity":    "low" | "medium" | "high",
  "reason":      "short human-readable reason",
  "category":    "pii" | "regulated_advice" | "injection" | …,
  "decision_id": "1e9b…",
  "audit_url":   "https://app.kirtonic.io/dashboard/engine/decisions/1e9b…"
}

GET /api/extension/download

Returns a freshly-built .zip of the live extension/ source tree. Unauthenticated on purpose (the extension is useless without a workspace API token, which is gated separately). Each request re-zips on the server using a minimal STORE-method writer (src/lib/zip.ts), so any edit to the source tree ships to the next downloader without a build step.

POST /api/workspaces/[id]/api-keys

Existing endpoint, reused. The Extension dashboard page posts to it with { name, scopes: ['extension:verdict'] } when the user clicks Mint extension token. Returns the plaintext token once; only the SHA-256 hash is persisted. The user pastes the plaintext into the extension's Settings page.

section 07

Authentication & authorisation

The extension authenticates as an API key. There is no concept of an extension-specific identity. The reasons for reusing api_keys:

The dashboard already has key-management UI (mint, revoke, last-used timestamp, scope chips). The Extension page filters that table to keys carrying the extension:verdictscope and displays them as "extension tokens."
The auth path in requireApiAuth() works for both Bearer tokens and cookie sessions. The verdict endpoint accepts both, so an admin can also poke it from a browser tab logged into the dashboard.
Scopes are stored as text[]. Adding extension:verdict to the API_SCOPES list in src/lib/api-scopes.ts was the only schema-relevant change required.

The service-role caveat. When the request authenticates via Bearer token, the context returned by requireApiAuth() uses the service-role Supabase client, not the cookie-session client. This is because Bearer requests have no Supabase session (the extension origin chrome-extension://[id]/ has no cookies for app.kirtonic.io), so RLS predicates that call auth.uid() or is_workspace_member() would fail every insert. The scope + workspace check at the top of requireApiAuth IS the authorisation; RLS is redundant for already-authenticated API calls. Every /api/v1/* route filters writes by ctx.workspaceId explicitly, which is what keeps cross-tenant data isolated under the service role.

section 08

The classifier pipeline

The verdict endpoint is essentially a thin shell around classifyMessage() + ingestSignal(). Both are shared with the playground and the production v1 API.

Inputs to the classifier

classifyMessage() takes the prompt text, the surface (shadow-ai-browser for extension calls), the role (user_input), and a workspace context bag containing two optional blocks:

policyRulesContext, the workspace's enabled policy rules formatted as a system-prompt section. Grouped by category, severity and action annotated.
pastCorrectionsContext, the most recent severity_overriddenaudit events from the same workspace, formatted as "PAST HUMAN CORRECTIONS" with the original prompt preview and the corrected severity. The system prompt tells the model these always win.

Output contract

The model must return JSON only, no markdown:

{
  "risk_score": 0.0, 1.0,
  "confidence": 0.0, 1.0,
  "reason":     "one short sentence, the concrete risk",
  "category":   "regulated_advice" | "pii" | "injection" |
                "safety" | "confidentiality" | "operational" |
                "ad_*" | "clean" | "other"
}

Defensive parse: strip optional code fences, JSON-parse with try/catch, clamp risk_score to [0, 1], default malformed output to { risk_score: 0.5, confidence: 0.3, reason: "classifier returned unparseable output", category: "other" } so the rest of the pipeline keeps working.

Deterministic override

After the LLM responds, findCorrectionMatch() checks whether the prompt matches a past reviewer correction. The match heuristic is Jaccard similarity over tokenised, normalised words (lowercased, punctuation stripped) with a threshold of 0.6, plus an exact-string fallback. If matched, applyCorrectionOverride() forces the risk_score into the band of the corrected severity:

severity "high"   → risk_score in 0.80, 0.95
severity "medium" → risk_score in 0.60, 0.79
severity "low"    → risk_score in 0.05, 0.59

This belt-and-braces design means a reviewer correction wins regardless of whether the LLM honoured the in-prompt instruction. The cost is one extra audit-log lookup per call.

From signal to verdict

ingestSignal() creates the signal, runs evaluateSignal() against the workspace rules to determine severity and status (awaiting_approval vs auto_approved), inserts the decision, writes two audit rows, and fires webhooks asynchronously. The verdict endpoint then maps the final decision to allow | warn | block for the extension.

section 09

Storage model

The extension touches the same five tables every other governance writer touches. No extension-specific schema.

api_keys, one row per extension token. Plaintext is shown once at creation; only SHA-256 hash is stored. scopes text[] contains extension:verdict. last_used_at is updated best-effort on each verdict call so the dashboard can show device freshness.
governance_signals, one row per verdict call. source = "extension/{site}" for filtering. The metadata jsonb bag carries site, url, role, category, reason, and a 200-char message_preview (used by the dashboard list, the hover tooltip on the Pulse page, and the correction-match preview).
governance_decisions, one row per signal. Severity + status. Approved by a reviewer? Status flips to approved and a second audit row (severity_overridden) is written so the classifier learns. This second row is the entire mechanism by which approval generalises.
governance_audit, every state transition. Extension calls write at minimum a signal_received + a decision_created or auto_approved. Reviewer overrides add approved, rejected, or severity_overridden. actor_type = 'api_key' with actor_id = null for extension events (the sentinel user is used).
governance_policy_rules, read only by the extension pathway. Edited from /dashboard/engine/rules. The classifier loads enabled rules for the surface and renders them as system-prompt text.

section 10

Extension internals

The extension itself is small: ~600 lines of JS + CSS + HTML in extension/. Manifest V3, no build step, no bundler, everything is plain ES modules that Chrome and Edge load directly.

manifest.json, declares MV3 schema, the content script's match patterns (the four AI chat sites), the background service worker, the popup, the options page, and exactly two permissions: storage only. host_permissions additionally include http://localhost/* and https://*.kirtonic.io/* so the service worker can reach the verdict endpoint without triggering an MV3 permissions warning at the user.
content.js, the in-page intercept. IIFE-scoped with a window.__kirtonic_loaded__idempotence guard so SPA navigations don't double-bind. Module-level state (pillEl, approvedUntil, inflight, sessionAllowed) is declared at the top to avoid TDZ errors from hoisted function calls. Capture-phase listeners on document for keydown / keypress / keyup / click. Sites with React event delegation must lose the race for stopImmediatePropagation to bite, that's why we run at document_start so our listeners register before theirs.
background.js, the service worker. Holds nothing in memory across dispatches (MV3 service workers can be evicted at any time). On chrome.runtime.onMessage, reads token + base URL from chrome.storage.sync, POSTs to the verdict endpoint, returns the parsed JSON to the sender. Same shape used for the popup's Test connection ping (smallest valid body, always classifies as low risk).
popup.html/js/css, toolbar popup. Three controls: governance on/off (chrome.storage.sync.enabled), advisory/block toggle (.mode), and a Test connection button. Pings the verdict endpoint and shows "Connected, workspace policy active" or the actual error text so users can self-diagnose.
options.html/js/css, Settings page. Two fields: API token and API base URL. Default base is http://localhost:3000 in development; points at production once deployed.

section 11

Override flow and reviewer feedback loop

When a reviewer determines that a previously blocked prompt should pass future evaluations of the same pattern, the following sequence executes:

Reviewer opens the decision at /dashboard/engine/decisions/[id], clicks Approve.
POST /api/v1/decisions/[id]/approve runs. Decision status flips to approved, an approved audit row is written, and (the critical bit) a second severity_overridden audit row is written with { from: existing.severity, to: 'low', reason, via: 'approve' }. Without the second row, approval would be a single-instance waiver only; with it, the classifier learns.
On the next extension call, loadRecentReclassifications() reads the audit log, joins to the original signal's metadata to get the prompt preview, and formats it into the classifier's system prompt as a past correction.
findCorrectionMatch()Jaccard-matches the new prompt against the correction preview. On hit, the LLM's output is forced into the corrected severity band regardless of what the LLM said.
The verdict comes back as allow. The extension lets the prompt through. The user never sees a block dialog for that pattern again.

Session allow-list (faster path). The reviewer loop above takes a round-trip to the dashboard. For the same user who just hit a block and is sure they want to proceed, Send anyway (override) in the dialog adds the prompt to an in-memory Set<string> in the content script. Subsequent identical sends short-circuit the verdict round-trip entirely until the tab is reloaded. Cheap, immediate, and the original block decision still appears in the audit log so the override is on the record.

section 12

Telemetry surfaces

Three dashboard pages consume extension data, all built on the same governance_* tables:

/dashboard/engine/extension, extension-specific page. Mint / revoke tokens, install instructions, and a live activity feed filtered to source LIKE 'extension/%'.
/dashboard/engine/decisions, the global decision queue. Extension events appear alongside playground events and production v1 API events. Same approve/reject/reclassify controls work on all of them.
/dashboard/engine/pulse, the generative-art live view. Each glow dot is a signal; size scales with risk, colour with severity. Reviewer corrections trigger lightning bolts arcing to the new-severity cluster. Hover a node for prompt preview; long-press to enter focus mode for that category. A 24-bucket arc around the bloom shows the diurnal usage rhythm.

section 13

Threat model

What an attacker / motivated user can and cannot do, given the current design.

Can

Disable the extension. Any user can flip the master switch in the popup or uninstall the extension entirely. Defence is MDM/group-policy enforcement of the extension being installed and enabled, out of scope for v1; relevant for enterprise distribution via Chrome / Edge Add-on stores or by side-loading through corporate MDM.
Use a different browser. Same defence. Without coverage on every browser the user has access to, this is a defence-in-depth control, not a perimeter. The desktop agent (network proxy) is the planned answer to this.
Override blocks.Send Anyway is a feature, not a bug, but every override is audited. The audit row carries the user's sentinel ID (api-key, since extension calls authenticate as the workspace key, not as a specific user). For per-user attribution, the roadmap item is OAuth-style per-user tokens.
Type the prompt into a non-supported site.The extension only intercepts the four declared sites. New AI sites that ship after the extension was installed get no coverage until the manifest's match list is updated.

Cannot

Read data from other tabs. host_permissions is limited to the four AI sites + the API origin. The extension cannot see your bank tab or your email tab.
Exfiltrate the workspace API token. The token lives in chrome.storage.sync, accessible only to the extension origin. The page's JavaScript cannot read it.
Use a revoked token. Revocation in the dashboard marks revoked_at; the next verdict call fails with 401 from requireApiAuth.
Cross-tenant read or write. Even with service-role bypass, every query filters by ctx.workspaceId explicitly, and ingestSignal()inserts only with the token's own workspace id.

section 14

Performance & cost

Each verdict call costs approximately:

One Kirtonic classifier call, ~500 input tokens (policy + corrections + prompt), ~80 output tokens (the JSON envelope). At list price, low single-digit tenths of a US cent per call.
Three Supabase round-trips, policy rules, recent corrections (both run in parallel), then the signal+decision+audit insert (single transaction). All indexed.
One classifier round-trip, to the model configured for the workspace (Anthropic Claude by default, or a customer-trained classifier). Latency depends on the provider and the prompt sizes involved.

End-to-end round-trip from user presses Enter to verdict pill renders is dominated by the classifier call. The user experience target is for allowed prompts to feel transparent and for blocked prompts to surface a clear, actionable confirmation dialog.

Caching strategy. None on the hot path. Each prompt is treated as unique and the past-corrections context evolves with each reviewer action. Adding a memoised verdict cache keyed on (workspace_id, sha256(prompt)) with a short TTL is a clear future win for workspaces with high repeat-prompt rates.

section 15

Future work

Network-layer interception. Page-world script injected at document_start that monkey-patches window.fetch and XMLHttpRequest. Blocks the actual API request to the AI provider if a high-severity verdict comes back. Defends against React internals short-circuiting the keystroke intercept on a future site update.
Per-user identity. Today the extension authenticates as the workspace. OAuth-style device-pair flow would issue a per-user token bound to the logged-in Kirtonic identity, so audit rows carry the actual user ID rather than a sentinel.
Desktop agent / system proxy. Covers Cursor, Raycast AI, VS Code Copilot, internal CLIs, and any other process that talks to api.openai.com / api.anthropic.com / generativelanguage.googleapis.com. Same verdict endpoint on the backend; new client.
Chrome Web Store + Edge Add-ons publication. Lets enterprise IT push the extension via MDM rather than unpacked-install instructions per device.
Verdict cache. (workspace_id, sha256(prompt)) → cached verdict with 60s TTL. Saves ~40% on classifier spend in workspaces where users re-send near-identical prompts (most of them).