Browser extension · architecture & backend reference
How the Kirtonic browser extension enforces governance at the keystroke, what it talks to, and the design choices behind every step.
Overview
The Kirtonic browser extension is a Manifest V3 Chromium extension that intercepts every prompt a user is about to submit to a public AI chat site (chatgpt.com, chat.openai.com, claude.ai, gemini.google.com, copilot.microsoft.com), routes it through the same governance engine that the platform's server-side API uses, and either allows, warns, or blocks the submission based on the workspace policy.
The extension addresses the shadow-AI channel: traffic between workstation browsers and public AI chat domains. This traffic is encrypted in transit, terminates at third-party domains, and does not traverse corporate egress proxies as application-layer content. Conventional DLP and CASB controls cannot inspect prompt bodies; the extension performs the inspection client-side, before the host page issues its submit request.
Architecturally the extension is an additional API consumer of the existing governance backbone. It authenticates against POST /api/v1/extension/verdict with a Bearer token and writes through the shared governance_signals → governance_decisions → governance_audit pipeline used by the programmatic v1 API, the playground, and the ad reviewer. No extension-specific tables, queues, or audit surfaces are introduced.
Capabilities
The extension exposes the following system properties when deployed against a workspace:
- Pre-egress classification. Each prompt is intercepted before the host page issues its network request to the AI provider. The classifier call completes against the workspace policy + reviewer correction context and returns a verdict (
allow,warn,block) before the original submit is allowed to resume. - Per-tenant policy evaluation. Classification uses
governance_policy_rules+governance_audit.event_type = 'severity_overridden'from the calling workspace only. Two workspaces submitting the same prompt may receive different verdicts. - Audit-log persistence. Every verdict call inserts one row into
governance_signals, one row intogovernance_decisions, and the correspondingsignal_received+decision_created/auto_approvedrows intogovernance_audit. Outbound webhooks fire asynchronously. - Block enforcement at the keystroke. When the verdict is
blockand the workspace is in block-mode, the content script's capture-phase listener callsstopImmediatePropagation()+preventDefault()and does not re-fire the submit. The provider request is never formed. - Override audit trail. User overrides on a block dialog write an audit entry against the original
decision_idand add the prompt to a per-tab session whitelist so subsequent identical submissions short-circuit the classifier call. - Reviewer feedback loop. Severity reclassifications and approvals in the dashboard write
severity_overriddenaudit rows. The next verdict call from the same workspace loads these vialoadRecentReclassifications()and the deterministic correction override (Jaccard ≥ 0.6) forces matching prompts into the reviewer-corrected severity band without retraining the classifier. - Latency envelope. End-to-end verdict round-trip is dominated by the classifier call. Database and bookkeeping queries are comparatively small. Measured numbers will vary with the classifier configured for the workspace, the deployment topology, and the load on the AI provider.
Control gaps in the unmonitored channel
Public AI chat sites, chatgpt.com, claude.ai, gemini.google.com, copilot.microsoft.com, are accessed over HTTPS from a workstation browser to a third-party domain. None of the traffic crosses a corporate egress proxy as application-layer content; classical DLP, CASB, and email-gateway tooling cannot inspect prompt bodies. Without an in-browser intercept the following classes of control are absent by construction:
- Sensitive-data egress detection. No system records the content of prompts submitted to AI provider domains. The set of PII, credentials, regulated advice, and confidential material sent to third-party models is unknown to the organisation.
- Audit-log coverage for AI usage. Frameworks such as UK FCA SYSC 4, SOC 2 CC7, ISO 27001 A.8.16, and NIS2 require an evidence trail for controls over information processing. Absent an audit log per-prompt, the organisation cannot answer who submitted what, when, and against which policy for the AI channel.
- Intellectual-property containment.Source code and contract language pasted into public models become part of the provider's request context. Per-user attribution of which artefact left the perimeter is unrecoverable without an interception layer.
- Insider-risk detection in the chat surface. Outbound mail DLP and endpoint DLP do not see chat-tab content. Bulk exfiltration via paste-to- chat is invisible to existing data-loss controls.
- Prompt-injection detection. Webpage content rendered in an AI chat session can carry injection payloads. Without a classifier evaluating the prompt body, malicious patterns are not detected before the user submits the action.
- Policy refinement signal. Static written policy lacks the feedback loop required to identify and codify high-frequency violation patterns. Without intercepted traffic as input, policy authors operate on assumption rather than observation.
System architecture
The extension is one of three API consumers that ride the same governance backbone. The others are the programmatic /api/v1/*surface (used by customers' production code) and the in-platform playground / dashboard.
┌──────────────────────────────────────────────────────────┐
│ Browser tab │
│ ┌────────────────┐ ┌──────────────────────────────────┐│
│ │ AI chat site │◄─┤ Content script (isolated world) ││
│ │ (chatgpt.com, │ │ • intercept keydown/click ││
│ │ claude.ai, │ │ • read prompt from DOM ││
│ │ gemini, …) │ │ • paint verdict pill + dialog ││
│ └────────────────┘ └─────────────┬────────────────────┘│
│ │ chrome.runtime │
│ ┌───────▼────────┐ │
│ │ Service worker │ │
│ │ (background.js)│ │
│ │ • holds token │ │
│ │ • CORS fetch │ │
│ └───────┬────────┘ │
└────────────────────────────────────┼─────────────────────┘
│ HTTPS, Bearer
▼
┌────────────────────────────────────────────────────┐
│ POST /api/v1/extension/verdict (Next.js) │
│ ┌──────────────────────────────────────────────┐ │
│ │ requireApiAuth, validates Bearer token, │ │
│ │ checks scopes, returns service-role context │ │
│ └──────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Kirtonic hosted classifier: │ │
│ │ prompt + workspace policy rules + │ │
│ │ recent reviewer corrections │ │
│ └──────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Deterministic correction override │ │
│ │ (Jaccard match against past reclasses) │ │
│ └──────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────┐ │
│ │ ingestSignal() │ │
│ │ → governance_signals │ │
│ │ → governance_decisions (auto severity) │ │
│ │ → governance_audit (signal_received + …) │ │
│ │ → deliverWebhooks() (async fire-and-forget) │ │
│ └──────────────────────────────────────────────┘ │
│ │ │
│ JSON: { verdict, severity, reason, │
│ category, decision_id, audit_url } │
└────────────────────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────┐
│ Supabase Postgres (RLS by workspace_member) │
│ workspaces · workspace_members │
│ api_keys (extension:verdict scope) │
│ governance_signals · governance_decisions │
│ governance_audit · governance_policy_rules │
└────────────────────────────────────────────────────┘The boundary that matters: everything to the right of the HTTPS line is shared with the rest of the platform.The extension does not have a private database, a private classifier, or a private audit log. The dashboard's decision queue shows extension events alongside playground events alongside production v1 API events, ordered by time. This is intentional, there is only one record of what this workspace did with AI today, regardless of channel.
The full request lifecycle
From the moment a user presses Enter to the moment the prompt either reaches Claude or doesn't, here is every step:
- Keystroke intercept. The content script (
extension/content.js) attaches a capture-phasekeydownlistener ondocumentatdocument_start, before any of the site's React handlers register. On Enter without Shift / IME composition, the listener callse.stopImmediatePropagation()+e.preventDefault(), then reads the prompt text from the relevant editor element (#prompt-textareafor ChatGPT,div.ProseMirror[contenteditable="true"]for Claude, etc.). - Session allow-list check. If the user has previously clicked Send anyway on this exact prompt this browser session, the verdict round-trip is skipped entirely. The pill flashes "Allowed (your override)"and the keystroke is re-fired to the page so it submits normally. This both saves classifier cost and honours the user's explicit decision without re-asking.
- Verdict dispatch. The content script
postMessages aKIRTONIC_VERDICTpayload to the background service worker. Content scripts cannot make cross-origin requests reliably and cannot hold the Bearer token securely (the page can read its own globals), so all network I/O lives in the service worker. - Token + endpoint resolution. The service worker reads
apiTokenandapiBasefromchrome.storage.sync(encrypted by Chrome and scoped to the extension's origin). It POSTs to{apiBase}/api/v1/extension/verdictwith body{ message, site, url, mode }and headerAuthorization: Bearer {token}. - CORS preflight. The verdict endpoint exports an
OPTIONShandler that returnsAccess-Control-Allow-Origin: *plus the methods/headers list, so the browser's preflight succeeds. All POST responses (success and error) also carry the same CORS headers. - Bearer validation.
requireApiAuth(req, ['extension:verdict'])SHA-256s the token, looks it up inapi_keys.hashvia the service role client, rejects on revoked/expired/scope-missing, then bumpslast_used_atbest-effort. The returned context hasworkspaceId, sentineluserId(00000000-0000-0000-0000-000000000000), and a service-role Supabase client. The service-role bypass is critical: the extension has no Supabase session cookie, so RLS policies that depend onauth.uid()would reject every insert. The scope+workspace check is the authorisation; RLS is the wrong layer for it. - Classifier round-trip. Three context loads run in parallel: enabled policy rules for the surface (
loadEnabledPolicyRules), recent reviewer reclassifications (loadRecentReclassifications), and the message itself. These are fed toclassifyMessage(), which calls the Anthropic Messages API with a strict JSON-only system prompt. Returns{ risk_score, confidence, reason, category }. - Deterministic correction override.Before trusting the classifier's numeric output,
findCorrectionMatch()Jaccard-similarity-matches the prompt against past corrections from the same workspace (threshold 0.6 or exact normalised match). If hit,applyCorrectionOverride()forces therisk_scoreinto the band corresponding to the reviewer's corrected severity. This guarantees the loop closes even if the LLM ignored the in-prompt instruction to honour past overrides. - Signal + decision + audit insert. The shared
ingestSignal()helper writes one row togovernance_signals(withsource: "extension/{site}"and a metadata bag including the prompt preview), one row togovernance_decisions(severity derived fromevaluateSignalagainst the workspace rules), and two rows togovernance_audit(signal_receivedand eitherdecision_createdorauto_approved). Webhooks fire async viadeliverWebhooks()without blocking the response. - Verdict response. Severity → verdict mapping:
high→block,awaiting_approval→warn, elseallow. The JSON response carriesverdict,severity,reason,category, thedecision_id, and a deep-linkaudit_url. - Pill + dialog UX. Content script paints a bottom-right pill colour-coded by verdict. For
blockin block mode, a confirm dialog opens with the classifier reason and a two-line "Risk if you proceed" warning; the user can Cancel or Send Anyway (Override). Forwarn, same dialog with softer copy. Forallow, pill auto-dismisses after 3.5s. - Re-fire (on approval). If the verdict allows or the user overrides, the content script calls
markApproved()(sets a 800ms approval window), then dispatches a syntheticKeyboardEvent('keydown', { key: 'Enter' })at the original target. The keydown listener checksisApproved()first and returns without callingpreventDefault, so the event flows to the site's React handler untouched, triggering its normal submit path. - Override audit (on block override). If the user clicked Send Anyway on a blocked prompt, the prompt text is added to
sessionAllowedfor the rest of the browser session. The original audit row already records the block decision; the override itself is implicit in the subsequent identical-prompt passthrough.
API surface
The extension touches exactly three HTTP endpoints. Two are unique to the extension experience; one is the same key-management endpoint the dashboard uses.
POST /api/v1/extension/verdict
The only endpoint called per-prompt. Bearer-authenticated with the extension:verdict scope. CORS-enabled for any origin so the background service worker (whose origin is chrome-extension://[id]/) can call it.
POST /api/v1/extension/verdict HTTP/1.1
Host: app.kirtonic.io
Content-Type: application/json
Authorization: Bearer cw_live_AbCdEf123…
{
"message": "the prompt text the user typed",
"site": "claude.ai",
"url": "https://claude.ai/chat/abc-123",
"mode": "user_input" // or "model_output" for response-side scoring
}
200 OK
{
"verdict": "allow" | "warn" | "block",
"severity": "low" | "medium" | "high",
"reason": "short human-readable reason",
"category": "pii" | "regulated_advice" | "injection" | …,
"decision_id": "1e9b…",
"audit_url": "https://app.kirtonic.io/dashboard/engine/decisions/1e9b…"
}GET /api/extension/download
Returns a freshly-built .zip of the live extension/ source tree. Unauthenticated on purpose (the extension is useless without a workspace API token, which is gated separately). Each request re-zips on the server using a minimal STORE-method writer (src/lib/zip.ts), so any edit to the source tree ships to the next downloader without a build step.
POST /api/workspaces/[id]/api-keys
Existing endpoint, reused. The Extension dashboard page posts to it with { name, scopes: ['extension:verdict'] } when the user clicks Mint extension token. Returns the plaintext token once; only the SHA-256 hash is persisted. The user pastes the plaintext into the extension's Settings page.
Authentication & authorisation
The extension authenticates as an API key. There is no concept of an extension-specific identity. The reasons for reusing api_keys:
- The dashboard already has key-management UI (mint, revoke, last-used timestamp, scope chips). The Extension page filters that table to keys carrying the
extension:verdictscope and displays them as "extension tokens." - The auth path in
requireApiAuth()works for both Bearer tokens and cookie sessions. The verdict endpoint accepts both, so an admin can also poke it from a browser tab logged into the dashboard. - Scopes are stored as
text[]. Addingextension:verdictto theAPI_SCOPESlist insrc/lib/api-scopes.tswas the only schema-relevant change required.
The service-role caveat. When the request authenticates via Bearer token, the context returned by requireApiAuth() uses the service-role Supabase client, not the cookie-session client. This is because Bearer requests have no Supabase session (the extension origin chrome-extension://[id]/ has no cookies for app.kirtonic.io), so RLS predicates that call auth.uid() or is_workspace_member() would fail every insert. The scope + workspace check at the top of requireApiAuth IS the authorisation; RLS is redundant for already-authenticated API calls. Every /api/v1/* route filters writes by ctx.workspaceId explicitly, which is what keeps cross-tenant data isolated under the service role.
The classifier pipeline
The verdict endpoint is essentially a thin shell around classifyMessage() + ingestSignal(). Both are shared with the playground and the production v1 API.
Inputs to the classifier
classifyMessage() takes the prompt text, the surface (shadow-ai-browser for extension calls), the role (user_input), and a workspace context bag containing two optional blocks:
policyRulesContext, the workspace's enabled policy rules formatted as a system-prompt section. Grouped by category, severity and action annotated.pastCorrectionsContext, the most recentseverity_overriddenaudit events from the same workspace, formatted as "PAST HUMAN CORRECTIONS" with the original prompt preview and the corrected severity. The system prompt tells the model these always win.
Output contract
The model must return JSON only, no markdown:
{
"risk_score": 0.0, 1.0,
"confidence": 0.0, 1.0,
"reason": "one short sentence, the concrete risk",
"category": "regulated_advice" | "pii" | "injection" |
"safety" | "confidentiality" | "operational" |
"ad_*" | "clean" | "other"
}Defensive parse: strip optional code fences, JSON-parse with try/catch, clamp risk_score to [0, 1], default malformed output to { risk_score: 0.5, confidence: 0.3, reason: "classifier returned unparseable output", category: "other" } so the rest of the pipeline keeps working.
Deterministic override
After the LLM responds, findCorrectionMatch() checks whether the prompt matches a past reviewer correction. The match heuristic is Jaccard similarity over tokenised, normalised words (lowercased, punctuation stripped) with a threshold of 0.6, plus an exact-string fallback. If matched, applyCorrectionOverride() forces the risk_score into the band of the corrected severity:
severity "high" → risk_score in 0.80, 0.95 severity "medium" → risk_score in 0.60, 0.79 severity "low" → risk_score in 0.05, 0.59
This belt-and-braces design means a reviewer correction wins regardless of whether the LLM honoured the in-prompt instruction. The cost is one extra audit-log lookup per call.
From signal to verdict
ingestSignal() creates the signal, runs evaluateSignal() against the workspace rules to determine severity and status (awaiting_approval vs auto_approved), inserts the decision, writes two audit rows, and fires webhooks asynchronously. The verdict endpoint then maps the final decision to allow | warn | block for the extension.
Storage model
The extension touches the same five tables every other governance writer touches. No extension-specific schema.
api_keys, one row per extension token. Plaintext is shown once at creation; only SHA-256 hash is stored.scopes text[]containsextension:verdict.last_used_atis updated best-effort on each verdict call so the dashboard can show device freshness.governance_signals, one row per verdict call.source = "extension/{site}"for filtering. Themetadata jsonbbag carriessite,url,role,category,reason, and a 200-charmessage_preview(used by the dashboard list, the hover tooltip on the Pulse page, and the correction-match preview).governance_decisions, one row per signal. Severity + status. Approved by a reviewer? Status flips toapprovedand a second audit row (severity_overridden) is written so the classifier learns. This second row is the entire mechanism by which approval generalises.governance_audit, every state transition. Extension calls write at minimum asignal_received+ adecision_createdorauto_approved. Reviewer overrides addapproved,rejected, orseverity_overridden.actor_type = 'api_key'withactor_id = nullfor extension events (the sentinel user is used).governance_policy_rules, read only by the extension pathway. Edited from/dashboard/engine/rules. The classifier loads enabled rules for the surface and renders them as system-prompt text.
Extension internals
The extension itself is small: ~600 lines of JS + CSS + HTML in extension/. Manifest V3, no build step, no bundler, everything is plain ES modules that Chrome and Edge load directly.
manifest.json, declares MV3 schema, the content script's match patterns (the four AI chat sites), the background service worker, the popup, the options page, and exactly two permissions:storageonly.host_permissionsadditionally includehttp://localhost/*andhttps://*.kirtonic.io/*so the service worker can reach the verdict endpoint without triggering an MV3 permissions warning at the user.content.js, the in-page intercept. IIFE-scoped with awindow.__kirtonic_loaded__idempotence guard so SPA navigations don't double-bind. Module-level state (pillEl,approvedUntil,inflight,sessionAllowed) is declared at the top to avoid TDZ errors from hoisted function calls. Capture-phase listeners ondocumentforkeydown/keypress/keyup/click. Sites with React event delegation must lose the race forstopImmediatePropagationto bite, that's why we run atdocument_startso our listeners register before theirs.background.js, the service worker. Holds nothing in memory across dispatches (MV3 service workers can be evicted at any time). Onchrome.runtime.onMessage, reads token + base URL fromchrome.storage.sync, POSTs to the verdict endpoint, returns the parsed JSON to the sender. Same shape used for the popup's Test connection ping (smallest valid body, always classifies as low risk).popup.html/js/css, toolbar popup. Three controls: governance on/off (chrome.storage.sync.enabled), advisory/block toggle (.mode), and a Test connection button. Pings the verdict endpoint and shows "Connected, workspace policy active" or the actual error text so users can self-diagnose.options.html/js/css, Settings page. Two fields: API token and API base URL. Default base ishttp://localhost:3000in development; points at production once deployed.
Override flow and reviewer feedback loop
When a reviewer determines that a previously blocked prompt should pass future evaluations of the same pattern, the following sequence executes:
- Reviewer opens the decision at
/dashboard/engine/decisions/[id], clicks Approve. POST /api/v1/decisions/[id]/approveruns. Decision status flips toapproved, anapprovedaudit row is written, and (the critical bit) a secondseverity_overriddenaudit row is written with{ from: existing.severity, to: 'low', reason, via: 'approve' }. Without the second row, approval would be a single-instance waiver only; with it, the classifier learns.- On the next extension call,
loadRecentReclassifications()reads the audit log, joins to the original signal's metadata to get the prompt preview, and formats it into the classifier's system prompt as a past correction. findCorrectionMatch()Jaccard-matches the new prompt against the correction preview. On hit, the LLM's output is forced into the corrected severity band regardless of what the LLM said.- The verdict comes back as
allow. The extension lets the prompt through. The user never sees a block dialog for that pattern again.
Session allow-list (faster path). The reviewer loop above takes a round-trip to the dashboard. For the same user who just hit a block and is sure they want to proceed, Send anyway (override) in the dialog adds the prompt to an in-memory Set<string> in the content script. Subsequent identical sends short-circuit the verdict round-trip entirely until the tab is reloaded. Cheap, immediate, and the original block decision still appears in the audit log so the override is on the record.
Telemetry surfaces
Three dashboard pages consume extension data, all built on the same governance_* tables:
/dashboard/engine/extension, extension-specific page. Mint / revoke tokens, install instructions, and a live activity feed filtered tosource LIKE 'extension/%'./dashboard/engine/decisions, the global decision queue. Extension events appear alongside playground events and production v1 API events. Same approve/reject/reclassify controls work on all of them./dashboard/engine/pulse, the generative-art live view. Each glow dot is a signal; size scales with risk, colour with severity. Reviewer corrections trigger lightning bolts arcing to the new-severity cluster. Hover a node for prompt preview; long-press to enter focus mode for that category. A 24-bucket arc around the bloom shows the diurnal usage rhythm.
Threat model
What an attacker / motivated user can and cannot do, given the current design.
Can
- Disable the extension. Any user can flip the master switch in the popup or uninstall the extension entirely. Defence is MDM/group-policy enforcement of the extension being installed and enabled, out of scope for v1; relevant for enterprise distribution via Chrome / Edge Add-on stores or by side-loading through corporate MDM.
- Use a different browser. Same defence. Without coverage on every browser the user has access to, this is a defence-in-depth control, not a perimeter. The desktop agent (network proxy) is the planned answer to this.
- Override blocks.Send Anyway is a feature, not a bug, but every override is audited. The audit row carries the user's sentinel ID (api-key, since extension calls authenticate as the workspace key, not as a specific user). For per-user attribution, the roadmap item is OAuth-style per-user tokens.
- Type the prompt into a non-supported site.The extension only intercepts the four declared sites. New AI sites that ship after the extension was installed get no coverage until the manifest's match list is updated.
Cannot
- Read data from other tabs.
host_permissionsis limited to the four AI sites + the API origin. The extension cannot see your bank tab or your email tab. - Exfiltrate the workspace API token. The token lives in
chrome.storage.sync, accessible only to the extension origin. The page's JavaScript cannot read it. - Use a revoked token. Revocation in the dashboard marks
revoked_at; the next verdict call fails with 401 fromrequireApiAuth. - Cross-tenant read or write. Even with service-role bypass, every query filters by
ctx.workspaceIdexplicitly, andingestSignal()inserts only with the token's own workspace id.
Performance & cost
Each verdict call costs approximately:
- One Kirtonic classifier call, ~500 input tokens (policy + corrections + prompt), ~80 output tokens (the JSON envelope). At list price, low single-digit tenths of a US cent per call.
- Three Supabase round-trips, policy rules, recent corrections (both run in parallel), then the signal+decision+audit insert (single transaction). All indexed.
- One classifier round-trip, to the model configured for the workspace (Anthropic Claude by default, or a customer-trained classifier). Latency depends on the provider and the prompt sizes involved.
End-to-end round-trip from user presses Enter to verdict pill renders is dominated by the classifier call. The user experience target is for allowed prompts to feel transparent and for blocked prompts to surface a clear, actionable confirmation dialog.
Caching strategy. None on the hot path. Each prompt is treated as unique and the past-corrections context evolves with each reviewer action. Adding a memoised verdict cache keyed on (workspace_id, sha256(prompt)) with a short TTL is a clear future win for workspaces with high repeat-prompt rates.
Future work
- Network-layer interception. Page-world script injected at
document_startthat monkey-patcheswindow.fetchandXMLHttpRequest. Blocks the actual API request to the AI provider if a high-severity verdict comes back. Defends against React internals short-circuiting the keystroke intercept on a future site update. - Per-user identity. Today the extension authenticates as the workspace. OAuth-style device-pair flow would issue a per-user token bound to the logged-in Kirtonic identity, so audit rows carry the actual user ID rather than a sentinel.
- Desktop agent / system proxy. Covers Cursor, Raycast AI, VS Code Copilot, internal CLIs, and any other process that talks to
api.openai.com/api.anthropic.com/generativelanguage.googleapis.com. Same verdict endpoint on the backend; new client. - Chrome Web Store + Edge Add-ons publication. Lets enterprise IT push the extension via MDM rather than unpacked-install instructions per device.
- Verdict cache.
(workspace_id, sha256(prompt))→ cached verdict with 60s TTL. Saves ~40% on classifier spend in workspaces where users re-send near-identical prompts (most of them).