Skip to content

Performance & Cost

DeepintShield ships five optimizations that move the safety-check tax off your critical path. Most are on by default; two are one-flag opt-ins for teams that want maximum latency or cost reduction.

<5ms p50Guardrail latency
Up to 60%LLM cost saved
max(g, m)Allow-path latency
~30–50%p99 reduction

Embedded guard runtime

Default: On. Guard evaluation runs in-process inside the gateway — no RPC hop. Saves ~20–300ms per request in single-binary deployments.

Read more →

Speculative dispatch

Default: Off (opt-in). Fire the provider call in parallel with input guards. Allow-path latency becomes max(guards, model) instead of guards + model.

Read more →

Async post-guards

Default: On (auto). When no output policy needs to block or redact, the post-LLM evaluation goes to a background goroutine — the response ships immediately.

Read more →

Per-category timeouts

Default: Opt-in. Tighten budgets per check class — PII <150ms, toxicity ~600ms, jailbreak ~1200ms. Slow classifiers no longer pull p99 up to a flat 1500ms ceiling.

Read more →

Semantic cache short-circuit

Default: On. Semantic cache runs before guard evaluation. A fuzzy hit short-circuits the whole pipeline — no guard call, no provider call. Up to 60% cost reduction on chatbot-style workloads.

Read more →

MetricDeepintShield defaultWhy
Guardrail latency (p50)<5msEmbedded runtime + decision cache + local-rule fast path
Allow-path total latencymax(guards, model)Speculative dispatch (non-streaming requests)
LLM cost savedUp to 60%Semantic cache short-circuit on templated traffic
Tail latency (p99)~30–50% lowerPer-category timeouts replace the flat 1500ms ceiling