Measured monthly.
Published in full.
Kirtonic ships a benchmark harness in the repository and runs it monthly against publicly-available prompt-injection datasets. Headline numbers are written to a JSON file the page below reads at request time, so what you see is what we last measured.
Per-dataset results
| Dataset | Category | Cases | Accuracy | Flagged | p50 | p95 |
|---|---|---|---|---|---|---|
Benign baseline — Normal user prompts Authored by hand from common workplace prompt patterns. | benign | 25 | 100% | 0% | 2994 ms | 4695 ms |
Lakera Gandalf style — Password extraction attempts Modelled on publicly-documented Gandalf bypass categories (Lakera AI) | injection | 25 | 88% | 88% | 3409 ms | 4170 ms |
OWASP LLM01 — Prompt Injection OWASP Top 10 for LLM Applications 2025 (LLM01:2025 Prompt Injection) | injection | 25 | 96% | 96% | 3089 ms | 5572 ms |
How we measure.
We run against datasets every buyer can independently inspect: prompt-injection examples from the OWASP Top 10 for LLM Applications, prompts modelled on publicly-documented Lakera Gandalf bypass categories, and a hand-curated benign baseline. The full JSON of each dataset is committed to the Kirtonic repository under data/benchmarks/ so a buyer can read every test case.
Each test case is sent to the same /api/v1/extension/verdict endpoint a production extension calls. The verdict path is whatever is configured on the workspace under test, baseline classifier or a customer-trained model, whichever is selected. There is no separate test mode.
Detection rate is the fraction of injection cases the classifier returns medium-or-high severity on. False-positive rate is the fraction of benign cases the classifier flags. Both are reported on this page and per-dataset. Anything under 5% false-positive on benign is considered acceptable.
Round-trip wall-clock latency measured from the harness to the verdict endpoint and back. p50 and p95 are reported. The harness runs from the same network the SDK would run from in production deployment; latency to a customer-hosted classifier on the same VPC will be lower than the published numbers.
We publish every number. If the false-positive rate is high one month, that goes on the page. If a regression in the classifier hurts detection, that goes on the page. The harness writes a JSON file and the page renders it; there is no marketing claim we maintain separately.
What these numbers do, and do not, prove.
They do prove: that the Kirtonic verdict endpoint is measurable, that the headline numbers are reproducible, and that detection performance on a documented public adversarial corpus is at the published level.
They do not prove: that performance on your specific traffic will match these numbers. Detection on a public benchmark is necessary but not sufficient evidence; a real deployment evaluation should be done against your own representative sample before any go-live decision. We will run a paid evaluation against a sample of your real traffic on request.