Features — AI Gateway for AWS Bedrock¶

stdapi.ai is an AI gateway purpose-built for AWS. It brings full OpenAI and Anthropic API compatibility to AWS Bedrock and AWS AI services — so any tool, SDK, or application your team already uses connects instantly, without code changes.

One URL change, 80+ models — Drop in as an OpenAI or Anthropic replacement
Everything stays in your AWS account — No third-party routing, no data sharing
Enterprise compliance built in — ISO, SOC, HIPAA, GDPR, FedRAMP via AWS
Production in minutes — Terraform module on AWS Marketplace, 14-day free trial

How It Works¶

stdapi.ai sits between your applications and AWS services, translating OpenAI and Anthropic API calls into native AWS requests. Any tool or SDK that speaks either protocol connects instantly — no plugins, no custom integrations.

%%{init: {'flowchart': {'htmlLabels': true}} }%%
flowchart LR
  openwebui["<img src='../styles/logo_openwebui.svg' style='height:48px;width:auto;vertical-align:middle;' /> Open WebUI"] --> stdapi["<img src='../styles/logo.svg' style='height:64px;width:auto;vertical-align:middle;' /> stdapi.ai"]
  n8n["<img src='../styles/logo_n8n.svg' style='height:48px;width:auto;vertical-align:middle;' /> n8n"] --> stdapi
  ide["<img src='../styles/logo_vscode.svg' style='height:48px;width:auto;vertical-align:middle;' /> IDE + AI Assistant"] --> stdapi
  openai_app["<img src='../styles/logo_openai.svg' style='height:48px;width:auto;vertical-align:middle;' /> Any OpenAI App"] --> stdapi
  anthropic_app["<img src='../styles/logo_anthropic.svg' style='height:48px;width:auto;vertical-align:middle;' /> Any Anthropic App"] --> stdapi
  stdapi --> bedrock["<img src='../styles/logo_amazon_bedrock.svg' style='height:48px;width:auto;vertical-align:middle;' /> AWS Bedrock"]
  bedrock --> claude["<img src='../styles/logo_anthropic_claude.svg' style='height:36px;width:auto;vertical-align:middle;' /> Claude"]
  bedrock --> qwen["<img src='../styles/logo_qwen.svg' style='height:36px;width:auto;vertical-align:middle;' /> Qwen"]
  bedrock --> mistral["<img src='../styles/logo_mistralai.svg' style='height:36px;width:auto;vertical-align:middle;' /> Mistral"]
  bedrock --> stability["<img src='../styles/logo_stabilityai.svg' style='height:36px;width:auto;vertical-align:middle;' /> Stability AI"]
  bedrock --> more["✨ and more..."]
  stdapi --> transcribe["<img src='../styles/logo_amazon_transcribe.svg' style='height:48px;width:auto;vertical-align:middle;' /> AWS Transcribe"]
  stdapi --> polly["<img src='../styles/logo_amazon_polly.svg' style='height:48px;width:auto;vertical-align:middle;' /> AWS Polly"]
  stdapi --> s3["<img src='../styles/logo_amazon_s3.svg' style='height:48px;width:auto;vertical-align:middle;' /> Amazon S3"]

Latency overhead

The gateway adds negligible per-request processing overhead — typically a few milliseconds. End-to-end latency is dominated by Bedrock model inference time. Streaming responses are passed through immediately with no intermediate buffering.

Why stdapi.ai?¶

Complete API surface
Most gateways cover chat completions only. stdapi.ai delivers the full OpenAI surface on AWS: chat completions, embeddings, image generation and editing, text-to-speech, speech-to-text, translation, and file storage — all through standard API calls, with no AWS-specific code in your application.
Your data, your account
stdapi.ai runs entirely within your own VPC — no traffic leaves your account. AWS Bedrock never retains or trains on your prompts. The software supply chain is hardened end-to-end — distributed as a validated container image with no public package registry exposure.
Multiply your throughput
Every AWS region has its own independent quota. Configure three regions and you get approximately three times your tokens-per-minute. Multi-region failover is fully automatic — clients never see a throttle error.
Every Bedrock capability, zero custom code
Prompt caching, extended thinking, guardrails, service tiers, cross-region inference profiles, system tools (Nova web grounding, code interpreter), SSML for speech synthesis — every Bedrock-native feature exposed through standard OpenAI and Anthropic APIs.

API Compatibility¶

Your existing applications, SDKs, and tools work immediately — no plugins or client changes needed.

Supported Endpoints¶

OpenAI-Compatible:

Endpoint	Capability	AWS Backend
`/v1/chat/completions`	Conversational AI, tool calling, multi-modal	AWS Bedrock Converse API
`/v1/responses`	Stateless conversational AI with tool calling and streaming	AWS Bedrock Converse API
`/v1/embeddings`	Vector embeddings for search & RAG	AWS Bedrock Embedding Models
`/v1/images/generations`	Text-to-image generation	AWS Bedrock Image Models
`/v1/images/edits`	Image editing, inpainting & transformations	AWS Bedrock Image Models
`/v1/images/variations`	Image variations	AWS Bedrock Image Models
`/v1/audio/speech`	Text-to-speech with SSML support	Amazon Polly
`/v1/audio/transcriptions`	Speech-to-text with speaker diarization	Amazon Transcribe
`/v1/audio/translations`	Speech-to-English translation	Amazon Transcribe + Amazon Translate
`/v1/models`	Model discovery & listing	AWS Bedrock
`/v1/files`	File upload, listing, metadata, download, deletion	Amazon S3
`/v1/uploads`	Multipart upload sessions for large files	Amazon S3
`/search_models`	Search models by capability: modality, route, MCP tool, region, streaming	Internal

Anthropic-Compatible:

Endpoint	Capability	AWS Backend
`/v1/messages`	Conversational AI, tool calling, multi-modal	AWS Bedrock Converse API
`/v1/messages/count_tokens`	Count tokens without sending a message	AWS Bedrock CountTokens API
`/v1/models`	Model discovery & listing	AWS Bedrock
`/v1/models/{model_id}`	Model details	AWS Bedrock
`/v1/files`	File upload, listing, metadata, download, deletion	Amazon S3

Route prefix

Anthropic-compatible routes are prefixed with /anthropic by default (e.g., /anthropic/v1/messages). The prefix is configurable via ANTHROPIC_ROUTES_PREFIX.

Parameter Coverage¶

stdapi.ai maps as many parameters as possible to Bedrock equivalents — across all routes, not just chat:

Generation controls — temperature, max_tokens, top_p, top_k, stop, seed, frequency_penalty, presence_penalty, logit_bias, top_logprobs, streaming via SSE, token usage reporting
Reasoning — reasoning_effort (minimal/low/medium/high/xhigh), enable_thinking, thinking_budget
Tool / function calling — Full OpenAI and Anthropic schemas, parallel tool calls, tool choice modes
All content types — System, developer, user, assistant, and tool roles; text, image, audio, video, and document content
Response formats — JSON object, JSON schema, streaming chunks, reasoning_content, annotations
Model-specific extras — Any parameter beyond the standard API via extra_body or top-level request fields

Bedrock & model differences

Not every parameter maps identically across all models. Check the API documentation for details.

80+ Models Across 10+ Providers¶

Access every model available on AWS Bedrock through a single, consistent API.

Anthropic Claude
Claude Opus, Claude Sonnet, Claude Haiku — including reasoning models. Use official Anthropic model names (e.g., claude-opus-4-6) — they resolve automatically.
Amazon Nova
Nova — including reasoning-capable variants. Canvas for images. Multimodal embeddings. Built-in web grounding and code interpreter.
Meta Llama
Llama Scout, Maverick, and earlier Llama variants.
Alibaba Qwen
Qwen and Qwen Coder — including thinking mode.
DeepSeek
Latest DeepSeek V3 models with automatic reasoning content surfacing.
Moonshot Kimi
Kimi with optional thinking mode.
Mistral AI
Mistral, Mixtral, and Mistral Large variants.
Cohere
Command models for chat; Embed v4 for multimodal embeddings.
Stability AI
Stable Diffusion 3.5, SD3 Ultra, and specialty models (upscale, style, search).
MiniMax & more
MiniMax, Writer Palmyra, AI21 Jamba, TwelveLabs Marengo video embeddings, and others.

Model Management¶

Automatic model discovery — Scans configured regions at startup; no manual model list to maintain
Model aliases — Map custom names to Bedrock model IDs; Claude and OpenAI names resolve automatically
Deprecated model failover — Requests to retired models transparently redirect to their replacements
Legacy model filtering — Optionally hide deprecated models from the models list

Text & Conversational AI¶

All message roles: system, developer, user, assistant, tool
Multi-turn conversations with full history
Tool / function calling with parallel execution
Structured JSON output (JSON object and JSON schema modes)
Streaming via Server-Sent Events with real-time token delivery
Reasoning content blocks (thinking, reasoning_content) for supported models
Web search results as context (search_result content blocks)

Images¶

Generation — Text-to-image with:

Multiple output formats: PNG, JPEG, WebP with adjustable quality and compression
Flexible sizes and aspect ratios
Streaming generation with partial image previews
Style presets (model-specific)

Editing — Powerful inpainting and transformation:

Mask-based inpainting (define edit regions precisely)
Image-to-image transformation (style, structure conditioning)
Background removal, object search & replace, object recolor
Creative and conservative upscaling

Variations — Create alternative versions of existing images

JSON body format — Reference images via Files API file_id or URL instead of re-uploading

Audio¶

Text-to-Speech (Amazon Polly):

60+ voices across 30+ languages
Multiple engine tiers: Standard, Neural, Long-Form, Generative
SSML support — control pronunciation, emphasis, pauses, prosody
Output formats: MP3, PCM, Opus, AAC, FLAC, OGG Vorbis
Speed control (0.25× to 4×)
Automatic language detection via Amazon Comprehend

Speech-to-Text (Amazon Transcribe):

100+ languages
Speaker diarization — automatic speaker separation and labeling
Word-level and segment-level timestamps
Subtitle export: SRT and VTT formats
Vocabulary customization and custom language models
Automatic language detection

Speech Translation — Transcribe audio and translate to English in a single request

Documents & Files¶

PDF input with optional citation support (precise source references in responses)
Plain text and structured content blocks as context
File storage via the Files API — upload once, reference by ID across multiple requests
Multipart uploads for large files via the Uploads API (S3 native multipart)
File expiry with configurable TTL (1 hour – 30 days)

Video¶

Video input in chat completions for supported models (e.g., Amazon Nova)
S3 URLs as direct video input for multimodal embeddings

Embeddings¶

Text embeddings — single and batch processing
Multimodal embeddings — images, audio, video, PDF documents
Dimension control (model-specific reduction)
Float or Base64 output encoding
S3 URL input for large files; oversized base64 payloads auto-uploaded to S3

Purpose-Built for AWS¶

Multi-Region & Quota Multiplication¶

Configure multiple AWS regions to scale your throughput and maximize availability:

Routing Strategy	Description	Prompt Caching
`ordered` (default)	Try regions in order; skip blocked ones	✓ Compatible
`lowest_latency`	Prefer fastest measured region	✓ Compatible
`round_robin`	Distribute evenly across regions	—
`disabled`	Single region per model	✓ Compatible

3 regions ≈ 3× your tokens-per-minute — each region has its own independent quota
Automatic failover — transparent region switching on throttle, quota, or service errors
Exponential backoff — doubles per consecutive error, capped at 1 hour
Region health tracking — per-model health status with configurable recovery delays

Resilience & Failover

Advanced Bedrock Features¶

Feature	Description
Prompt Caching	Cache system prompts, messages, and tools; granular section control; configurable TTL; cache metrics in every response
Reasoning Modes	Extended thinking with effort levels (minimal → xhigh) for Claude and Nova; `thinking_budget` for token-level control
Bedrock Guardrails	Content filtering and safety policies with configurable trace levels
Service Tiers	Priority, default, and flex latency tiers per request
Application Inference Profiles	Custom profiles for workload isolation and cost attribution
Prompt Routers	Bedrock prompt routers for intelligent model selection
Cross-Region Inference	Geography-pinned (US, EU, APAC) and global profiles with data residency control
System Tools (Nova)	Web grounding with URL citations; code interpreter
Claude Server Tools	Bash, text editor, computer use (3.5+), memory (3.7+)
Extra Model Parameters	Any model-specific parameter forwarded via `extra_body` or top-level field

AWS AI Services Integration¶

Service	Capability
Amazon Polly	60+ voices, 30+ languages, SSML, multiple engines and audio formats
Amazon Transcribe	100+ languages, speaker diarization, timestamps, SRT/VTT subtitles
Amazon Translate	Language translation for audio translation workflows
Amazon Comprehend	Automatic language detection for intelligent voice routing

Amazon S3 Integration¶

S3 is woven into the entire API surface — not just file storage:

Files API — Full CRUD at /v1/files with no artificial size limit (up to S3's ~5 TB), optional expiry, S3 Lifecycle backstop; file IDs work across both OpenAI and Anthropic endpoints
Multipart uploads — /v1/uploads backed by S3 native multipart; stream large files without buffering
Direct s3:// image references — Use s3://bucket/key in chat completions and Anthropic Messages; the gateway reads from S3 via IAM role — no pre-signed URLs
Files API in image operations — Reference uploaded files by file_id in image edits and variations
Multimodal embeddings — Pass s3:// URLs directly; oversized base64 payloads auto-uploaded and invoked asynchronously
Regional buckets — One bucket per Bedrock region; S3 region routing is automatic
Transfer Acceleration — Faster downloads via generated HTTP links

Security & Compliance¶

Authentication¶

stdapi.ai supports multiple authentication strategies to fit your architecture:

Method	How	Best For
API Key	`Authorization: Bearer` or `X-API-Key` header; stored in SSM Parameter Store or Secrets Manager (never plain text)	Direct clients, SDKs
OIDC / Cognito	Delegate to AWS Application Load Balancer or API Gateway	Web apps, SSO
AWS IAM (SigV4)	Via API Gateway with IAM authorization	Internal AWS services
No authentication	Open access	Private VPC deployments

Authentication & Security

Security Features¶

Feature	Description
Industry-Standard API Key Hashing	API keys hashed with a cryptographic function + per-key salt; constant-time comparison prevents timing attacks; only the hash is retained in memory
SSRF Protection	Blocks loopback, link-local, private network addresses, and DNS rebinding attacks
Trusted Hosts	Restrict which hostnames the service responds to
CORS Controls	Configurable cross-origin resource sharing policies
CSRF Protection	Built-in cross-site request forgery protection
Input Validation	Configurable strict mode — rejects malformed or out-of-spec requests at the gateway edge
Proxy Header Handling	Secure forwarded header processing for ALB and CloudFront
TLS 1.2+ in transit	All AWS service calls encrypted; the Terraform module configures ALB with TLS 1.3 and post-quantum hybrid key exchange

Commercial: Hardened Container Image AWS Marketplace

The commercial image is security-validated by AWS Marketplace and includes: read-only root filesystem, dropped Linux capabilities, minimal installed packages, and no shell. The Terraform module also configures a Customer Managed KMS key (auto-rotation enabled) for all data at rest.

Compliance & Data Sovereignty¶

Data never leaves your AWS account — every AWS service call is restricted to the regions you configure. All AWS services used by stdapi.ai (Bedrock, S3, Polly, Transcribe, and more) are in scope for GDPR, ISO 27001/27017/27018, SOC 1/2/3, HIPAA, FedRAMP, PCI-DSS, and CSA STAR Level 2. The commercial Terraform module adds VPC endpoints (no internet egress), Customer Managed KMS keys, and region-pinned cross-region profiles for strict data residency.

Data Sovereignty & Compliance

Works with Your Existing Tools¶

stdapi.ai is a drop-in replacement in hundreds of applications and frameworks. Change the API endpoint — nothing else.

Chat Interfaces
Open WebUI, LobeHub, LibreChat, Chatbot UI — private ChatGPT-style experiences on AWS
AI Coding Assistants
Claude Code, Continue.dev, Cline, Cursor, Windsurf, Aider — backed by Claude, Kimi, Qwen Coder
Workflow Automation
n8n, Make, Zapier — connect AI to your business processes
Agent Frameworks
OpenClaw, LangChain, LlamaIndex, CrewAI, LangGraph, AutoGPT — multi-agent systems on Bedrock
Team Chatbots
Slack, Discord, Microsoft Teams — AI assistants in your team's communication tools
Knowledge Management
Obsidian, Notion, Logseq — AI-powered writing assistance and search

See all use cases

AI Agents¶

Agent Discovery¶

AI agents can automatically discover the API's capabilities through standardized RFC 8288 Link headers and an API catalog:

Link headers — Root endpoint (/) includes Link response headers advertising available resources (rel="service-desc", rel="service-doc") when documentation endpoints are enabled, and rel="mcp-server-card" when MCP is enabled
API catalog at /.well-known/api-catalog — RFC 9727 machine-readable catalog (RFC 9264 Linkset format) listing the OpenAPI schema, documentation, and MCP server card
MCP server card at /.well-known/mcp/server-card.json — SEP-1649 server card advertising available MCP transports and capabilities; active when MCP is enabled

Set ENABLE_OPENAPI_JSON=true to activate schema-based agent discovery — this exposes the machine-readable OpenAPI schema at /openapi.json and populates the Link headers and API catalog. ENABLE_DOCS and ENABLE_REDOC also enable it as a side effect, but those UIs are human-facing and not needed by agents.

MCP (Model Context Protocol)¶

stdapi.ai exposes its full API surface as MCP tools, letting AI agents and orchestrators call any endpoint directly through the Model Context Protocol — no HTTP client code required.

Streamable HTTP transport — The recommended method, implementing the latest MCP Streamable HTTP specification at /mcp
SSE transport — Available at /sse for backwards compatibility with older MCP clients
All endpoints as tools — Every API operation (chat, images, audio, files, models) is surfaced as a named MCP tool with generated documentation
Selective exposure — Include or exclude tools by name to limit scope and reduce agent confusion
Automatic timeout alignment — MCP calls respect the global AI_RESPONSE_TIMEOUT setting

# Enable MCP via HTTP (recommended)
export ENABLE_MCP_STREAMABLE_HTTP=true

# Restrict to safe, read-oriented tools
export MCP_EXCLUDE_TOOLS="openai_files_delete,anthropic_files_delete"

Example use cases:

AI coding assistants
Connect Claude Code, Cursor, or Cline directly to stdapi.ai via MCP. Agents can generate text, search models, and manage files without any custom integration code.
Agentic pipelines
Let orchestration frameworks (LangChain, LlamaIndex, CrewAI) discover and call Bedrock models dynamically. Include search_models so agents can find the right model ID, and openai_chat_completion for inference — keep the tool surface minimal.
Multimodal automation
Give an agent access to openai_chat_completion, openai_image_generation, and openai_audio_speech to build self-contained pipelines that generate text, images, and audio in a single session.
RAG pipelines
Expose openai_embedding and file management tools so agents can index documents, compute embeddings, and retrieve context autonomously — all backed by S3 and Bedrock.

MCP Configuration

Observability & Operations¶

Structured Logging¶

JSON logs to stdout — natively ingested by CloudWatch Logs
Every request logs: method, path, status, model ID, region(s) used, execution time
Optional: full request/response payloads, client IP (disabled by default)
Configurable log levels (info, warning, error, critical, disabled)

OpenTelemetry Integration¶

Export traces and metrics to AWS X-Ray, Datadog, Jaeger, or any OTLP-compatible backend
Configurable sampling rate
Root span per request with full correlation IDs

Token Usage Tracking¶

Input, output, reasoning, and cached token counts in every API response
Consistent reporting across all endpoints (chat, messages, embeddings, images, audio)

Developer Tools¶

Swagger UI at /docs — test endpoints directly in your browser
ReDoc at /redoc — clean, searchable API reference
OpenAPI schema at /openapi.json — import into Postman, generate client code

Quality of Life¶

Model aliases — Map custom names to Bedrock IDs; Claude model names resolve automatically
Model auto-detection — Discovers available models across all configured regions at startup
Model list caching — Fast model listing without repeated AWS API calls
Token estimation — Optional pre-flight token count via tiktoken (without calling Bedrock); useful for client-side budgeting and routing decisions
Safety identifier — safety_identifier field in requests as an alias to user for abuse tracking and audit trails
Zero-configuration startup — Automatic region and model detection; warnings on missing config
Deprecated model failover — Requests to retired models silently redirect to their replacements

Deployment¶

Community vs Commercial¶

	Community	Commercial
Price	Free	$0.10/container-hour - With 14-day free trial
License	AGPL-3.0	AWS Marketplace SCMP
API compatibility	Full	Full
Container image	Community (GHCR)	Hardened, AWS Marketplace validated
Deployment	Docker / self-managed	Terraform module (ECS Fargate) - AWS Marketplace container image
Production infrastructure	—	Fully featured - AWS Well-Architected - Hardened
Commercial support	—	1 business day

How stdapi.ai Compares¶

All four solutions below expose an OpenAI-compatible API in front of AWS Bedrock. The comparison focuses on the AWS deployment context — LiteLLM is evaluated with AWS services as the backend provider (Bedrock, Polly, Transcribe), not as a multi-cloud proxy. Bedrock Access Gateway is the official AWS-maintained open-source sample. Bedrock Mantle is AWS's own managed OpenAI-compatible endpoint, requiring no self-hosting.

Capability	stdapi.ai	LiteLLM (on AWS)	Bedrock Access Gateway	Bedrock Mantle
OpenAI Chat completions				²
OpenAI Embeddings				—
Anthropic Messages API			—	¹⁴
OpenAI Responses API		—	—	²
OpenAI Image generation			—	—
OpenAI Image editing		—	—	—
OpenAI Image variations		—	—	—
OpenAI TTS (speech)		¹²	—	—
OpenAI STT (transcription)		—	—	—
OpenAI Files & Uploads API		—	—	—
OpenAI Realtime API	—		—	—
Cohere Rerank API	—		—	—
Integrated MCP server		—	—	—
Bedrock Full model catalog		¹	¹⁰	²
Multimodal inputs	text · image · audio · video · docs	text · image · docs	text · image	text · image
Multi-region quota multiplication		⁸	—	—
Bedrock Cross-region inference profiles		¹³	¹³	—
Bedrock system tools		—	—	—
Bedrock Guardrails			—
Bedrock Service tiers			—
Bedrock Application inference profiles
Bedrock prompt routers		—	—
Bedrock Prompt caching & reasoning		⁶		⁹
Runs in your AWS account				—
Model auto-discovery		¹	⁷
Deprecated model failover		—	—	—
Ready-to-use deployment		—	³
Commercial support			—	⁴
Self-hosted				—
AWS-native focus		⁵
Multi-provider support	—	¹¹	—	—
Open-source community	Small	Large - ~40k+ ★	Medium - ~1k ★	—
Source license	AGPL-3.0 (community) · commercial	MIT	MIT-0	AWS service
Distribution & supply chain	AWS Marketplace · GHCR	pip/PyPI	GitHub (MIT-0)	AWS-managed

About the alternatives

LiteLLM — widely adopted multi-cloud proxy with a large open-source community. Ideal when you need a single entry point across OpenAI, Azure, AWS, and others. AWS deployment and security features (WAF, VPC endpoints) require manual setup. Also offers a commercial Enterprise tier.
Bedrock Access Gateway — official open-source AWS sample (MIT-0), actively maintained by AWS teams. Covers chat completions and embeddings only. No WAF, auto-scaling, monitoring, or commercial support included.
Bedrock Mantle — AWS's own native OpenAI-compatible endpoint backed by AWS's full compliance and SLA. No self-hosting required. Supports chat completions and the Responses API; limited to a subset of models (mostly newer open-weight models — Claude 3.x/4.x, Nova, Llama, Cohere, and Stability AI image models are not available). Routes through an AWS-managed endpoint, not your private VPC. See model availability.

Ready to Get Started?¶

Start 14-Day Free Trial — Production-ready Terraform deployment on AWS Marketplace
Getting Started Guide — Deploy to AWS with Terraform
Run Locally — Free Docker image for development
API Reference — Full endpoint documentation and examples

Full Bedrock catalog supported; each model must be declared in config (applies to auto-discovery) ↩↩
Subset of Bedrock models — Claude 3.x/4.x, Nova, Llama, AI21, Cohere, and Stability AI (images) not available; supports mostly newer open-weight models (DeepSeek, Gemma, Qwen, Kimi, MiniMax, newer Mistral, etc.) — see AWS endpoint availability ↩↩↩
CDK reference sample — no WAF, auto-scaling, monitoring, or commercial support ↩
Covered through your existing AWS Support plan ↩
Generalist multi-cloud proxy covering 100+ providers; AWS-specific Bedrock features and security integrations may lag behind dedicated solutions ↩
Prompt caching and reasoning supported on standard routes; coverage varies by model — not all Bedrock models support prompt caching or extended thinking ↩
Auto-discovery limited to the single deployed region — some models are only available in specific AWS regions ↩
Achievable via the LiteLLM router, but requires manually declaring each model per region with explicit TPM/RPM limits — no automatic quota distribution ↩
Claude 3.x/4.x (which support prompt caching) not available on Mantle; reasoning available via select open-weight models (Qwen3 thinking, etc.) ↩
Single-region deployment — some models are only available in specific AWS regions; no cross-region catalog aggregation ↩
100+ providers: OpenAI, Azure OpenAI, GCP Vertex, Anthropic direct, and more — ideal when you need a single gateway across multiple clouds ↩
Requires connecting Amazon Polly as the TTS backend — not included by default in a LiteLLM on AWS deployment ↩
Supported by specifying the cross-region inference profile ARN as the model ID — no automatic profile selection ↩↩
Anthropic Messages API supported on Mantle, but only for a subset of Anthropic models — Claude 3.x/4.x are not available; coverage limited to models supported by the Mantle endpoint ↩