v1.77.3-stable - Priority Based Rate Limiting

September 21, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaff

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.77.3-stable

pip install litellm
pip install litellm==1.77.3

Key Highlights

+550 RPS Performance Improvements - Optimizations in request handling and object initialization.
Priority Quota Reservation - Proxy admins can now reserve TPM/RPM capacity for specific keys.

Priority Quota Reservation

This release adds support for priority quota reservation. This allows Proxy Admins to reserve specific percentages of model capacity for different use cases.

This is great for use cases where you want to ensure your realtime use cases must always get priority responses and background development jobs can take longer.

This release adds support for priority quota reservation. This allows Proxy Admins to reserve TPM/RPM capacity for keys based on metadata priority levels, ensuring critical production workloads get guaranteed access regardless of development traffic volume.

Get started here

+550 RPS Performance Improvements

This release delivers significant RPS improvements through targeted optimizations.

We've achieved a +500 RPS boost by fixing cache type inconsistencies that were causing frequent cache misses, plus an additional +50 RPS by removing unnecessary coroutine checks from the hot path.

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
SambaNova	`sambanova/deepseek-v3.1`	128K	$0.90	$0.90	Chat completions
SambaNova	`sambanova/gpt-oss-120b`	128K	$0.72	$0.72	Chat completions
OVHCloud	Various models	Varies	Contact provider	Contact provider	Chat completions
CompactifAI	Various models	Varies	Contact provider	Contact provider	Chat completions
TwelveLabs	`twelvelabs/marengo-embed-2.7`	32K	$0.12	$0.00	Embeddings

Features

OVHCloud AI Endpoints
- New provider support with comprehensive model catalog - PR #14494
CompactifAI
- New provider integration - PR #14532
SambaNova
- Added DeepSeek v3.1 and GPT-OSS-120B models - PR #14500
Bedrock
- Cross-region inference profile cost calculation - PR #14566
- AWS external ID parameter support for authentication - PR #14582
- CountTokens API implementation - PR #14557
- Titan V2 encoding_format parameter support - PR #14687
- Nova Canvas image generation inference profiles - PR #14578
- Bedrock Batches API - batch processing support with file upload and request transformation - PR #14618
- Bedrock Twelve Labs embedding provider support - PR #14697
Vertex AI
- Gemini labels field provider-aware filtering - PR #14563
- Gemini Batch API support - PR #14733
Volcengine
- Fixed thinking parameters when disabled - PR #14569
Cohere
- Handle Generate API deprecation, default to chat endpoints - PR #14676
TwelveLabs
- Added Marengo Embed 2.7 embedding support - PR #14674

Bug Fixes

Bedrock
- Empty arguments handling in tool call invocation - PR #14583
Vertex AI
- Avoid deepcopy crash with non-pickleables in Gemini/Vertex - PR #14418
XAI
- Fix unsupported stop parameter for grok-code models - PR #14565
Gemini
- Updated error message for Gemini API - PR #14589
- Fixed 2.5 Flash Image Preview model routing - PR #14715
- API key passing for token counting endpoints - PR #14744

New Provider Support

OVHCloud AI Endpoints
- Complete provider integration with model catalog and authentication - PR #14494
CompactifAI
- New provider support with documentation - PR #14532

LLM API Endpoints

Features

/responses
- Added cancel endpoint support for non-admin users - PR #14594
- Improved response session handling and cold storage configuration with s3 - PR #14534
- Added OpenAI & Azure /responses/cancel endpoint support - PR #14561
General
- Enhanced rate limit error messages with details - PR #14736
- Middle-truncation for spend log payloads - PR #14637

Bugs

/chat/completions
- Fixed completion chat ID handling - PR #14548
- Prevent AttributeError for _get_tags_from_request_kwargs - PR #14735
/responses
- Fixed cost calculation - PR #14675
General
- Rate limiter AttributeError fix - PR #14609

Spend Tracking, Budgets and Rate Limiting

Responses API Cost Calculation fix - PR #14675
Anthropic Cache Token Pricing - Separate 1-hour vs 5-minute cache creation costs - PR #14620, PR #14652
Indochina Time Timezone support for budget resets - PR #14666
Soft Budget Alert Cache Issues - Resolved soft budget alert cache issues - PR #14491
Dynamic Rate Limiter v3 - Priority routing improvements - PR #14734
Enhanced Rate Limit Errors - More detailed error messages - PR #14736

Management Endpoints / UI

Features

Team Member Service Account Keys - Allow team members to view keys they create - PR #14619
Default Budget for JWT Teams - Auto-assign budgets to generated teams - PR #14514
SSO Access Control Groups - Enhanced token info endpoint integration - PR #14738
Health Test Connect Protection - Restrict access based on model creation permissions - PR #14650
Amazon Bedrock Guardrail Info View - Enhanced logging visualization - PR #14696

Bug Fixes

SCIM v2 - Fix group PUSH and PUT operations for non-existent members - PR #14581
Guardrail View/Edit/Delete behavior fixes - PR #14622
In-Memory Guardrail update failures - PR #14653

Logging / Guardrail Integrations

Features

DataDog
- Enhanced spend tracking metrics - PR #14555
- Stream support with is_streamed_request parameter - PR #14673
- Fixed tool calls metadata passing - PR #14531
Langfuse
- Added logging support for Responses API - PR #14597
Langsmith
- Langsmith Sampling Rate - Key/Team-level tracing configuration - PR #14740
Prometheus
- Multi-worker support improvements - PR #14530
- User email labels in monitoring - PR #14520
Opik
- Fixed timezone issue - PR #14708

Bug Fixes

S3
- Fixed 404 error when using s3_endpoint_url - PR #14559

Guardrails

Tool Permission Guardrail - Fine-grained tool access control - PR #14519
Bedrock Guardrails - Selective guarding support with runtime endpoint configuration - PR #14575, PR #14650
Default Last Message in guardrails - PR #14640
AWS exceptions handling despite 200 response - PR #14658

New Integration

PostHog - Complete observability integration for LiteLLM usage tracking and analytics - PR #14610

MCP Gateway

MCP Server Alias Parsing - Multi-part URL path support - PR #14558
MCP Filter Recomputation - After server deletion - PR #14542
MCP Gateway Tools List improvements - PR #14695

Performance / Loadbalancing / Reliability improvements

+500 RPS Performance Boost when sending the user field - PR #14616
+50 RPS by removing iscoroutine from hot path - PR #14649
7% reduction in init overhead - PR #14689
Generic Object Pool implementation for better resource management - PR #14702

General Proxy Improvements

Middle-Truncation for spend log payloads - PR #14637

Security

Security Update - Bump aiohttp==3.12.14, fix CVE-2025-53643 - PR #14638

New Contributors

@luisfucros made their first contribution in PR #14500
@hanakannzashi made their first contribution in PR #14548
@eliasto made their first contribution in PR #14494
@Rasmusafj made their first contribution in PR #14491
@LingXuanYin made their first contribution in PR #14569
@ronaldpereira made their first contribution in PR #14613
@hula-la made their first contribution in PR #14534
@carlos-marchal-ph made their first contribution in PR #14610
@akraines made their first contribution in PR #14637
@mrFranklin made their first contribution in PR #14708
@tcx4c70 made their first contribution in PR #14675
@michaeltansg made their first contribution in PR #14666
@tosi29 made their first contribution in PR #14725
@gmdfalk made their first contribution in PR #14735
@FelipeRodriguesGare made their first contribution in PR #14733
@mritunjaysharma394 made their first contribution in PR #14678

Deploy this version​

Key Highlights​

Priority Quota Reservation​

+550 RPS Performance Improvements​

New Models / Updated Models​

New Model Support​

Features​

Bug Fixes​

New Provider Support​

LLM API Endpoints​

Features​

Bugs​

Spend Tracking, Budgets and Rate Limiting​

Management Endpoints / UI​

Features​

Bug Fixes​

Logging / Guardrail Integrations​

Features​

Bug Fixes​

Guardrails​

New Integration​

MCP Gateway​

Performance / Loadbalancing / Reliability improvements​

General Proxy Improvements​

Security​

New Contributors​

Full Changelog​

Deploy this version

Key Highlights

Priority Quota Reservation

+550 RPS Performance Improvements

New Models / Updated Models

New Model Support

Features

Bug Fixes

New Provider Support

LLM API Endpoints

Features

Bugs

Spend Tracking, Budgets and Rate Limiting

Management Endpoints / UI

Features

Bug Fixes

Logging / Guardrail Integrations

Features

Bug Fixes

Guardrails

New Integration

MCP Gateway

Performance / Loadbalancing / Reliability improvements

General Proxy Improvements

Security

New Contributors

Full Changelog