SGLang
Overview
Section titled “Overview”SGL (SGLang) is an OpenAI-compatible local/remote inference engine used for serving models with high throughput. DeepIntShield delegates all operations to the OpenAI provider implementation. Key features:
- OpenAI API compatibility - Identical request/response format
- Full streaming support - Server-Sent Events with usage tracking
- Tool calling - Complete function definition and execution
- Text embeddings - Support for embedding models
- Parameter filtering - Removes unsupported fields for compatibility
Supported Operations
Section titled “Supported Operations”| Operation | Non-Streaming | Streaming | Endpoint |
|---|---|---|---|
| Chat Completions | ✅ | ✅ | /v1/chat/completions |
| Responses API | ✅ | ✅ | /v1/chat/completions |
| Text Completions | ✅ | ✅ | /v1/completions |
| Embeddings | ✅ | - | /v1/embeddings |
| List Models | ✅ | - | /v1/models |
| Image Generation | ❌ | ❌ | - |
| Speech (TTS) | ❌ | ❌ | - |
| Transcriptions (STT) | ❌ | ❌ | - |
| Files | ❌ | ❌ | - |
| Batch | ❌ | ❌ | - |
1. Chat Completions
Section titled “1. Chat Completions”Request Parameters
Section titled “Request Parameters”SGL supports all standard OpenAI chat completion parameters. For full parameter reference and behavior, see OpenAI Chat Completions.
Filtered Parameters
Section titled “Filtered Parameters”Removed for SGL compatibility:
prompt_cache_key- Not supportedverbosity- Anthropic-specificstore- Not supportedservice_tier- OpenAI-specific
SGL supports all standard OpenAI message types, tools, responses, and streaming formats. For details on message handling, tool conversion, responses, and streaming, refer to OpenAI Chat Completions.
2. Responses API
Section titled “2. Responses API”Fallback to Chat Completions with format conversion:
ResponsesRequest → ChatRequest → Response conversionSame parameter support as Chat Completions.
3. Text Completions
Section titled “3. Text Completions”SGL supports legacy text completion format:
| Parameter | Mapping |
|---|---|
prompt | Direct pass-through |
max_tokens | max_tokens |
temperature, top_p | Direct pass-through |
frequency_penalty, presence_penalty | Supported |
4. Embeddings
Section titled “4. Embeddings”SGL supports text embeddings for vector generation:
| Parameter | Notes |
|---|---|
input | Text or array of texts |
model | Embedding model name |
encoding_format | ”float” or “base64” |
dimensions | Model-specific dimension count |
Response returns embedding vectors with usage information.
5. List Models
Section titled “5. List Models”Lists available models from SGL server with capabilities.
Unsupported Features
Section titled “Unsupported Features”| Feature | Reason |
|---|---|
| Speech/TTS | Not offered by SGL API |
| Transcription/STT | Not offered by SGL API |
| Batch Operations | Not offered by SGL API |
| File Management | Not offered by SGL API |
Caveats
Section titled “Caveats”BaseURL Configuration Required
Severity: High Behavior: BaseURL must be explicitly configured Impact: Requests fail without proper configuration Code: Validated in NewSGLProvider
Cache Control Stripped
Severity: Medium Behavior: Cache control directives are removed from messages Impact: Prompt caching features don’t work Code: Stripped during JSON marshaling
Parameter Filtering
Severity: Low Behavior: OpenAI-specific fields filtered out Impact: prompt_cache_key, verbosity, store removed Code: filterOpenAISpecificParameters
User Field Size Limit
Severity: Low Behavior: User field > 64 characters silently dropped Impact: Longer user identifiers are lost Code: SanitizeUserField enforces 64-char max