Embedded guard runtime
Default: On. Guard evaluation runs in-process inside the gateway — no RPC hop. Saves ~20–300ms per request in single-binary deployments.
DeepintShield ships five optimizations that move the safety-check tax off your critical path. Most are on by default; two are one-flag opt-ins for teams that want maximum latency or cost reduction.
Embedded guard runtime
Default: On. Guard evaluation runs in-process inside the gateway — no RPC hop. Saves ~20–300ms per request in single-binary deployments.
Speculative dispatch
Default: Off (opt-in). Fire the provider call in parallel with input guards.
Allow-path latency becomes max(guards, model) instead of guards + model.
Async post-guards
Default: On (auto). When no output policy needs to block or redact, the post-LLM evaluation goes to a background goroutine — the response ships immediately.
Per-category timeouts
Default: Opt-in. Tighten budgets per check class — PII <150ms, toxicity
~600ms, jailbreak ~1200ms. Slow classifiers no longer pull p99 up to a flat
1500ms ceiling.
Semantic cache short-circuit
Default: On. Semantic cache runs before guard evaluation. A fuzzy hit short-circuits the whole pipeline — no guard call, no provider call. Up to 60% cost reduction on chatbot-style workloads.
| Metric | DeepintShield default | Why |
|---|---|---|
| Guardrail latency (p50) | <5ms | Embedded runtime + decision cache + local-rule fast path |
| Allow-path total latency | max(guards, model) | Speculative dispatch (non-streaming requests) |
| LLM cost saved | Up to 60% | Semantic cache short-circuit on templated traffic |
| Tail latency (p99) | ~30–50% lower | Per-category timeouts replace the flat 1500ms ceiling |