v1.77.3-stable - Priority Based Rate Limiting
Deploy this versionโ
- Docker
- Pip
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.77.3-stable
pip install litellm==1.77.3
Key Highlightsโ
- +550 RPS Performance Improvements - Optimizations in request handling and object initialization.
- Priority Quota Reservation - Proxy admins can now reserve TPM/RPM capacity for specific keys.
Priority Quota Reservationโ
This release adds support for priority quota reservation. This allows Proxy Admins to reserve specific percentages of model capacity for different use cases.
This is great for use cases where you want to ensure your realtime use cases must always get priority responses and background development jobs can take longer.
This release adds support for priority quota reservation. This allows Proxy Admins to reserve TPM/RPM capacity for keys based on metadata priority levels, ensuring critical production workloads get guaranteed access regardless of development traffic volume.
Get started here
+550 RPS Performance Improvementsโ
This release delivers significant RPS improvements through targeted optimizations.
We've achieved a +500 RPS boost by fixing cache type inconsistencies that were causing frequent cache misses, plus an additional +50 RPS by removing unnecessary coroutine checks from the hot path.
New Models / Updated Modelsโ
New Model Supportโ
Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Features |
---|---|---|---|---|---|
SambaNova | sambanova/deepseek-v3.1 | 128K | $0.90 | $0.90 | Chat completions |
SambaNova | sambanova/gpt-oss-120b | 128K | $0.72 | $0.72 | Chat completions |
OVHCloud | Various models | Varies | Contact provider | Contact provider | Chat completions |
CompactifAI | Various models | Varies | Contact provider | Contact provider | Chat completions |
TwelveLabs | twelvelabs/marengo-embed-2.7 | 32K | $0.12 | $0.00 | Embeddings |
Featuresโ
- OVHCloud AI Endpoints
- New provider support with comprehensive model catalog - PR #14494
- CompactifAI
- New provider integration - PR #14532
- SambaNova
- Added DeepSeek v3.1 and GPT-OSS-120B models - PR #14500
- Bedrock
- Cross-region inference profile cost calculation - PR #14566
- AWS external ID parameter support for authentication - PR #14582
- CountTokens API implementation - PR #14557
- Titan V2 encoding_format parameter support - PR #14687
- Nova Canvas image generation inference profiles - PR #14578
- Bedrock Batches API - batch processing support with file upload and request transformation - PR #14618
- Bedrock Twelve Labs embedding provider support - PR #14697
- Vertex AI
- Volcengine
- Fixed thinking parameters when disabled - PR #14569
- Cohere
- Handle Generate API deprecation, default to chat endpoints - PR #14676
- TwelveLabs
- Added Marengo Embed 2.7 embedding support - PR #14674
Bug Fixesโ
- Bedrock
- Empty arguments handling in tool call invocation - PR #14583
- Vertex AI
- Avoid deepcopy crash with non-pickleables in Gemini/Vertex - PR #14418
- XAI
- Fix unsupported stop parameter for grok-code models - PR #14565
- Gemini
New Provider Supportโ
- OVHCloud AI Endpoints
- Complete provider integration with model catalog and authentication - PR #14494
- CompactifAI
- New provider support with documentation - PR #14532
LLM API Endpointsโ
Featuresโ
- /responses
- General
Bugsโ
- /chat/completions
- /responses
- Fixed cost calculation - PR #14675
- General
- Rate limiter AttributeError fix - PR #14609
Spend Tracking, Budgets and Rate Limitingโ
- Responses API Cost Calculation fix - PR #14675
- Anthropic Cache Token Pricing - Separate 1-hour vs 5-minute cache creation costs - PR #14620, PR #14652
- Indochina Time Timezone support for budget resets - PR #14666
- Soft Budget Alert Cache Issues - Resolved soft budget alert cache issues - PR #14491
- Dynamic Rate Limiter v3 - Priority routing improvements - PR #14734
- Enhanced Rate Limit Errors - More detailed error messages - PR #14736
Management Endpoints / UIโ
Featuresโ
- Team Member Service Account Keys - Allow team members to view keys they create - PR #14619
- Default Budget for JWT Teams - Auto-assign budgets to generated teams - PR #14514
- SSO Access Control Groups - Enhanced token info endpoint integration - PR #14738
- Health Test Connect Protection - Restrict access based on model creation permissions - PR #14650
- Amazon Bedrock Guardrail Info View - Enhanced logging visualization - PR #14696
Bug Fixesโ
- SCIM v2 - Fix group PUSH and PUT operations for non-existent members - PR #14581
- Guardrail View/Edit/Delete behavior fixes - PR #14622
- In-Memory Guardrail update failures - PR #14653
Logging / Guardrail Integrationsโ
Featuresโ
- DataDog
- Langfuse
- Added logging support for Responses API - PR #14597
- Langsmith
- Langsmith Sampling Rate - Key/Team-level tracing configuration - PR #14740
- Prometheus
- Opik
- Fixed timezone issue - PR #14708
Bug Fixesโ
Guardrailsโ
- Tool Permission Guardrail - Fine-grained tool access control - PR #14519
- Bedrock Guardrails - Selective guarding support with runtime endpoint configuration - PR #14575, PR #14650
- Default Last Message in guardrails - PR #14640
- AWS exceptions handling despite 200 response - PR #14658
New Integrationโ
MCP Gatewayโ
- MCP Server Alias Parsing - Multi-part URL path support - PR #14558
- MCP Filter Recomputation - After server deletion - PR #14542
- MCP Gateway Tools List improvements - PR #14695
Performance / Loadbalancing / Reliability improvementsโ
- +500 RPS Performance Boost when sending the
user
field - PR #14616 - +50 RPS by removing iscoroutine from hot path - PR #14649
- 7% reduction in init overhead - PR #14689
- Generic Object Pool implementation for better resource management - PR #14702
General Proxy Improvementsโ
- Middle-Truncation for spend log payloads - PR #14637
Securityโ
- Security Update - Bump aiohttp==3.12.14, fix CVE-2025-53643 - PR #14638
New Contributorsโ
- @luisfucros made their first contribution in PR #14500
- @hanakannzashi made their first contribution in PR #14548
- @eliasto made their first contribution in PR #14494
- @Rasmusafj made their first contribution in PR #14491
- @LingXuanYin made their first contribution in PR #14569
- @ronaldpereira made their first contribution in PR #14613
- @hula-la made their first contribution in PR #14534
- @carlos-marchal-ph made their first contribution in PR #14610
- @akraines made their first contribution in PR #14637
- @mrFranklin made their first contribution in PR #14708
- @tcx4c70 made their first contribution in PR #14675
- @michaeltansg made their first contribution in PR #14666
- @tosi29 made their first contribution in PR #14725
- @gmdfalk made their first contribution in PR #14735
- @FelipeRodriguesGare made their first contribution in PR #14733
- @mritunjaysharma394 made their first contribution in PR #14678