Performance & Cost

DeepintShield ships five optimizations that move the safety-check tax off your critical path. Most are on by default; two are one-flag opt-ins for teams that want maximum latency or cost reduction.

<5ms p50Guardrail latency

Up to 60%LLM cost saved

max(g, m)Allow-path latency

~30–50%p99 reduction

The five knobs

Embedded guard runtime

Default: On. Guard evaluation runs in-process inside the gateway — no RPC hop. Saves ~20–300ms per request in single-binary deployments.

Speculative dispatch

Default: Off (opt-in). Fire the provider call in parallel with input guards. Allow-path latency becomes max(guards, model) instead of guards + model.

Async post-guards

Default: On (auto). When no output policy needs to block or redact, the post-LLM evaluation goes to a background goroutine — the response ships immediately.

Per-category timeouts

Default: Opt-in. Tighten budgets per check class — PII <150ms, toxicity ~600ms, jailbreak ~1200ms. Slow classifiers no longer pull p99 up to a flat 1500ms ceiling.

Semantic cache short-circuit

Default: On. Semantic cache runs before guard evaluation. A fuzzy hit short-circuits the whole pipeline — no guard call, no provider call. Up to 60% cost reduction on chatbot-style workloads.

Metric	DeepintShield default	Why
Guardrail latency (p50)	<5ms	Embedded runtime + decision cache + local-rule fast path
Allow-path total latency	max(guards, model)	Speculative dispatch (non-streaming requests)
LLM cost saved	Up to 60%	Semantic cache short-circuit on templated traffic
Tail latency (p99)	~30–50% lower	Per-category timeouts replace the flat 1500ms ceiling

Performance & Cost

The five knobs

What you can expect