Features — AI Gateway for AWS Bedrock¶
stdapi.ai is an AI gateway purpose-built for AWS. It brings full OpenAI and Anthropic API compatibility to AWS Bedrock and AWS AI services — so any tool, SDK, or application your team already uses connects instantly, without code changes.
- One URL change, 80+ models — Drop in as an OpenAI or Anthropic replacement
- Everything stays in your AWS account — No third-party routing, no data sharing
- Enterprise compliance built in — ISO, SOC, HIPAA, GDPR, FedRAMP via AWS
- Production in minutes — Terraform module on AWS Marketplace, 14-day free trial
How It Works¶
stdapi.ai sits between your applications and AWS services, translating OpenAI and Anthropic API calls into native AWS requests. Any tool or SDK that speaks either protocol connects instantly — no plugins, no custom integrations.
%%{init: {'flowchart': {'htmlLabels': true}} }%%
flowchart LR
openwebui["<img src='../styles/logo_openwebui.svg' style='height:48px;width:auto;vertical-align:middle;' /> Open WebUI"] --> stdapi["<img src='../styles/logo.svg' style='height:64px;width:auto;vertical-align:middle;' /> stdapi.ai"]
n8n["<img src='../styles/logo_n8n.svg' style='height:48px;width:auto;vertical-align:middle;' /> n8n"] --> stdapi
ide["<img src='../styles/logo_vscode.svg' style='height:48px;width:auto;vertical-align:middle;' /> IDE + AI Assistant"] --> stdapi
openai_app["<img src='../styles/logo_openai.svg' style='height:48px;width:auto;vertical-align:middle;' /> Any OpenAI App"] --> stdapi
anthropic_app["<img src='../styles/logo_anthropic.svg' style='height:48px;width:auto;vertical-align:middle;' /> Any Anthropic App"] --> stdapi
stdapi --> bedrock["<img src='../styles/logo_amazon_bedrock.svg' style='height:48px;width:auto;vertical-align:middle;' /> AWS Bedrock"]
bedrock --> claude["<img src='../styles/logo_anthropic_claude.svg' style='height:36px;width:auto;vertical-align:middle;' /> Claude"]
bedrock --> qwen["<img src='../styles/logo_qwen.svg' style='height:36px;width:auto;vertical-align:middle;' /> Qwen"]
bedrock --> mistral["<img src='../styles/logo_mistralai.svg' style='height:36px;width:auto;vertical-align:middle;' /> Mistral"]
bedrock --> stability["<img src='../styles/logo_stabilityai.svg' style='height:36px;width:auto;vertical-align:middle;' /> Stability AI"]
bedrock --> more["✨ and more..."]
stdapi --> transcribe["<img src='../styles/logo_amazon_transcribe.svg' style='height:48px;width:auto;vertical-align:middle;' /> AWS Transcribe"]
stdapi --> polly["<img src='../styles/logo_amazon_polly.svg' style='height:48px;width:auto;vertical-align:middle;' /> AWS Polly"]
stdapi --> s3["<img src='../styles/logo_amazon_s3.svg' style='height:48px;width:auto;vertical-align:middle;' /> Amazon S3"]
Latency overhead
The gateway adds negligible per-request processing overhead — typically a few milliseconds. End-to-end latency is dominated by Bedrock model inference time. Streaming responses are passed through immediately with no intermediate buffering.
Why stdapi.ai?¶
-
Complete API surface
Most gateways cover chat completions only. stdapi.ai delivers the full OpenAI surface on AWS: chat completions, embeddings, image generation and editing, text-to-speech, speech-to-text, translation, and file storage — all through standard API calls, with no AWS-specific code in your application. -
Your data, your account
stdapi.ai runs entirely within your own VPC — no traffic leaves your account. AWS Bedrock never retains or trains on your prompts. The software supply chain is hardened end-to-end — distributed as a validated container image with no public package registry exposure. -
Multiply your throughput
Every AWS region has its own independent quota. Configure three regions and you get approximately three times your tokens-per-minute. Multi-region failover is fully automatic — clients never see a throttle error. -
Every Bedrock capability, zero custom code
Prompt caching, extended thinking, guardrails, service tiers, cross-region inference profiles, system tools (Nova web grounding, code interpreter), SSML for speech synthesis — every Bedrock-native feature exposed through standard OpenAI and Anthropic APIs.
API Compatibility¶
Your existing applications, SDKs, and tools work immediately — no plugins or client changes needed.
Supported Endpoints¶
OpenAI-Compatible:
| Endpoint | Capability | AWS Backend |
|---|---|---|
/v1/chat/completions |
Conversational AI, tool calling, multi-modal | AWS Bedrock Converse API |
/v1/responses |
Stateless conversational AI with built-in tools (coming soon) | AWS Bedrock Converse API |
/v1/embeddings |
Vector embeddings for search & RAG | AWS Bedrock Embedding Models |
/v1/images/generations |
Text-to-image generation | AWS Bedrock Image Models |
/v1/images/edits |
Image editing, inpainting & transformations | AWS Bedrock Image Models |
/v1/images/variations |
Image variations | AWS Bedrock Image Models |
/v1/audio/speech |
Text-to-speech with SSML support | Amazon Polly |
/v1/audio/transcriptions |
Speech-to-text with speaker diarization | Amazon Transcribe |
/v1/audio/translations |
Speech-to-English translation | Amazon Transcribe + Amazon Translate |
/v1/models |
Model discovery & listing | AWS Bedrock |
/v1/files |
File upload, listing, metadata, download, deletion | Amazon S3 |
/v1/uploads |
Multipart upload sessions for large files | Amazon S3 |
/available_models |
List models filtered by modality (text, image, audio, embedding) | Internal |
Anthropic-Compatible:
| Endpoint | Capability | AWS Backend |
|---|---|---|
/v1/messages |
Conversational AI, tool calling, multi-modal | AWS Bedrock Converse API |
/v1/messages/count_tokens |
Count tokens without sending a message | AWS Bedrock CountTokens API |
/v1/models |
Model discovery & listing | AWS Bedrock |
/v1/models/{model_id} |
Model details | AWS Bedrock |
/v1/files |
File upload, listing, metadata, download, deletion | Amazon S3 |
Route prefix
Anthropic-compatible routes are prefixed with /anthropic by default (e.g., /anthropic/v1/messages). The prefix is configurable via ANTHROPIC_ROUTES_PREFIX.
Parameter Coverage¶
stdapi.ai maps as many parameters as possible to Bedrock equivalents — across all routes, not just chat:
- Generation controls —
temperature,max_tokens,top_p,top_k,stop,seed,frequency_penalty,presence_penalty,logit_bias,top_logprobs, streaming via SSE, token usage reporting - Reasoning —
reasoning_effort(minimal/low/medium/high/xhigh),enable_thinking,thinking_budget - Tool / function calling — Full OpenAI and Anthropic schemas, parallel tool calls, tool choice modes
- All content types — System, developer, user, assistant, and tool roles; text, image, audio, video, and document content
- Response formats — JSON object, JSON schema, streaming chunks,
reasoning_content,annotations - Model-specific extras — Any parameter beyond the standard API via
extra_bodyor top-level request fields
Bedrock & model differences
Not every parameter maps identically across all models. Check the API documentation for details.
80+ Models Across 10+ Providers¶
Access every model available on AWS Bedrock through a single, consistent API.
-
Anthropic Claude
Claude 4.6, Claude Sonnet, Claude Haiku — including reasoning models. Use official Anthropic model names (e.g.,claude-opus-4-6) — they resolve automatically. -
Amazon Nova
Nova Micro, Lite, Pro, Premier, Nova 2 with reasoning. Canvas for images. Multimodal embeddings. Built-in web grounding and code interpreter. -
Meta Llama
Llama 4 Scout, Maverick, and earlier Llama 3 variants. -
Alibaba Qwen
Qwen3 and Qwen Coder — including thinking mode. -
DeepSeek
Latest DeepSeek V3 models with automatic reasoning content surfacing. -
Moonshot Kimi K2
Kimi K2 with optional thinking mode. -
Mistral AI
Mistral, Mixtral, and Mistral Large variants. -
Cohere
Command models for chat; Embed v4 for multimodal embeddings. -
Stability AI
Stable Diffusion 3.5, SD3 Ultra, and specialty models (upscale, style, search). -
MiniMax & more
MiniMax M2.5, Writer Palmyra, AI21 Jamba, TwelveLabs Marengo video embeddings, and others.
Model Management¶
- Automatic model discovery — Scans configured regions at startup; no manual model list to maintain
- Model aliases — Map custom names to Bedrock model IDs; Claude and OpenAI names resolve automatically
- Deprecated model failover — Requests to retired models transparently redirect to their replacements
- Legacy model filtering — Optionally hide deprecated models from the models list
Multi-Modal Capabilities¶
Text & Conversational AI¶
- All message roles: system, developer, user, assistant, tool
- Multi-turn conversations with full history
- Tool / function calling with parallel execution
- Structured JSON output (JSON object and JSON schema modes)
- Streaming via Server-Sent Events with real-time token delivery
- Reasoning content blocks (
thinking,reasoning_content) for supported models - Web search results as context (
search_resultcontent blocks)
Images¶
Generation — Text-to-image with:
- Multiple output formats: PNG, JPEG, WebP with adjustable quality and compression
- Flexible sizes and aspect ratios
- Streaming generation with partial image previews
- Style presets (model-specific)
Editing — Powerful inpainting and transformation:
- Mask-based inpainting (define edit regions precisely)
- Image-to-image transformation (style, structure conditioning)
- Background removal, object search & replace, object recolor
- Creative and conservative upscaling
Variations — Create alternative versions of existing images
JSON body format — Reference images via Files API file_id or URL instead of re-uploading
Audio¶
Text-to-Speech (Amazon Polly):
- 60+ voices across 30+ languages
- Multiple engine tiers: Standard, Neural, Long-Form, Generative
- SSML support — control pronunciation, emphasis, pauses, prosody
- Output formats: MP3, PCM, Opus, AAC, FLAC, OGG Vorbis
- Speed control (0.25× to 4×)
- Automatic language detection via Amazon Comprehend
Speech-to-Text (Amazon Transcribe):
- 100+ languages
- Speaker diarization — automatic speaker separation and labeling
- Word-level and segment-level timestamps
- Subtitle export: SRT and VTT formats
- Vocabulary customization and custom language models
- Automatic language detection
Speech Translation — Transcribe audio and translate to English in a single request
Documents & Files¶
- PDF input with optional citation support (precise source references in responses)
- Plain text and structured content blocks as context
- File storage via the Files API — upload once, reference by ID across multiple requests
- Multipart uploads for large files via the Uploads API (S3 native multipart)
- File expiry with configurable TTL (1 hour – 30 days)
Video¶
- Video input in chat completions for supported models (e.g., Amazon Nova)
- S3 URLs as direct video input for multimodal embeddings
Embeddings¶
- Text embeddings — single and batch processing
- Multimodal embeddings — images, audio, video, PDF documents
- Dimension control (model-specific reduction)
- Float or Base64 output encoding
- S3 URL input for large files; oversized base64 payloads auto-uploaded to S3
Purpose-Built for AWS¶
Multi-Region & Quota Multiplication¶
Configure multiple AWS regions to scale your throughput and maximize availability:
| Routing Strategy | Description | Prompt Caching |
|---|---|---|
ordered (default) |
Try regions in order; skip blocked ones | ✓ Compatible |
lowest_latency |
Prefer fastest measured region | ✓ Compatible |
round_robin |
Distribute evenly across regions | — |
disabled |
Single region per model | ✓ Compatible |
- 3 regions ≈ 3× your tokens-per-minute — each region has its own independent quota
- Automatic failover — transparent region switching on throttle, quota, or service errors
- Exponential backoff — doubles per consecutive error, capped at 1 hour
- Region health tracking — per-model health status with configurable recovery delays
Advanced Bedrock Features¶
| Feature | Description |
|---|---|
| Prompt Caching | Cache system prompts, messages, and tools; granular section control; configurable TTL; cache metrics in every response |
| Reasoning Modes | Extended thinking with effort levels (minimal → xhigh) for Claude and Nova 2; thinking_budget for token-level control |
| Bedrock Guardrails | Content filtering and safety policies with configurable trace levels |
| Service Tiers | Priority, default, and flex latency tiers per request |
| Application Inference Profiles | Custom profiles for workload isolation and cost attribution |
| Prompt Routers | Bedrock prompt routers for intelligent model selection |
| Cross-Region Inference | Geography-pinned (US, EU, APAC) and global profiles with data residency control |
| System Tools (Nova) | Web grounding with URL citations; code interpreter |
| Claude Server Tools | Bash, text editor, computer use (3.5+), memory (3.7+) |
| Extra Model Parameters | Any model-specific parameter forwarded via extra_body or top-level field |
AWS AI Services Integration¶
| Service | Capability |
|---|---|
| Amazon Polly | 60+ voices, 30+ languages, SSML, multiple engines and audio formats |
| Amazon Transcribe | 100+ languages, speaker diarization, timestamps, SRT/VTT subtitles |
| Amazon Translate | Language translation for audio translation workflows |
| Amazon Comprehend | Automatic language detection for intelligent voice routing |
Amazon S3 Integration¶
S3 is woven into the entire API surface — not just file storage:
- Files API — Full CRUD at
/v1/fileswith no artificial size limit (up to S3's ~5 TB), optional expiry, S3 Lifecycle backstop; file IDs work across both OpenAI and Anthropic endpoints - Multipart uploads —
/v1/uploadsbacked by S3 native multipart; stream large files without buffering - Direct
s3://image references — Uses3://bucket/keyin chat completions and Anthropic Messages; the gateway reads from S3 via IAM role — no pre-signed URLs - Files API in image operations — Reference uploaded files by
file_idin image edits and variations - Multimodal embeddings — Pass
s3://URLs directly; oversized base64 payloads auto-uploaded and invoked asynchronously - Regional buckets — One bucket per Bedrock region; S3 region routing is automatic
- Transfer Acceleration — Faster downloads via generated HTTP links
Security & Compliance¶
Authentication¶
stdapi.ai supports multiple authentication strategies to fit your architecture:
| Method | How | Best For |
|---|---|---|
| API Key | Authorization: Bearer or X-API-Key header; stored in SSM Parameter Store or Secrets Manager (never plain text) |
Direct clients, SDKs |
| OIDC / Cognito | Delegate to AWS Application Load Balancer or API Gateway | Web apps, SSO |
| AWS IAM (SigV4) | Via API Gateway with IAM authorization | Internal AWS services |
| No authentication | Open access | Private VPC deployments |
Security Features¶
| Feature | Description |
|---|---|
| Industry-Standard API Key Hashing | API keys hashed with a cryptographic function + per-key salt; constant-time comparison prevents timing attacks; only the hash is retained in memory |
| SSRF Protection | Blocks loopback, link-local, private network addresses, and DNS rebinding attacks |
| Trusted Hosts | Restrict which hostnames the service responds to |
| CORS Controls | Configurable cross-origin resource sharing policies |
| CSRF Protection | Built-in cross-site request forgery protection |
| Input Validation | Configurable strict mode — rejects malformed or out-of-spec requests at the gateway edge |
| Proxy Header Handling | Secure forwarded header processing for ALB and CloudFront |
| TLS 1.2+ in transit | All AWS service calls encrypted; the Terraform module configures ALB with TLS 1.3 and post-quantum hybrid key exchange |
Commercial: Hardened Container Image AWS Marketplace
The commercial image is security-validated by AWS Marketplace and includes: read-only root filesystem, dropped Linux capabilities, minimal installed packages, and no shell. The Terraform module also configures a Customer Managed KMS key (auto-rotation enabled) for all data at rest.
Compliance & Data Sovereignty¶
Data never leaves your AWS account — every AWS service call is restricted to the regions you configure. All AWS services used by stdapi.ai (Bedrock, S3, Polly, Transcribe, and more) are in scope for GDPR, ISO 27001/27017/27018, SOC 1/2/3, HIPAA, FedRAMP, PCI-DSS, and CSA STAR Level 2. The commercial Terraform module adds VPC endpoints (no internet egress), Customer Managed KMS keys, and region-pinned cross-region profiles for strict data residency.
Works with Your Existing Tools¶
stdapi.ai is a drop-in replacement in hundreds of applications and frameworks. Change the API endpoint — nothing else.
-
Chat Interfaces
Open WebUI, LobeHub, LibreChat, Chatbot UI — private ChatGPT-style experiences on AWS -
AI Coding Assistants
Claude Code, Continue.dev, Cline, Cursor, Windsurf, Aider — backed by Claude 4.6, Kimi K2, Qwen Coder -
Workflow Automation
n8n, Make, Zapier — connect AI to your business processes -
Agent Frameworks
OpenClaw, LangChain, LlamaIndex, CrewAI, LangGraph, AutoGPT — multi-agent systems on Bedrock -
Team Chatbots
Slack, Discord, Microsoft Teams — AI assistants in your team's communication tools -
Knowledge Management
Obsidian, Notion, Logseq — AI-powered writing assistance and search
Observability & Operations¶
Structured Logging¶
- JSON logs to stdout — natively ingested by CloudWatch Logs
- Every request logs: method, path, status, model ID, region(s) used, execution time
- Optional: full request/response payloads, client IP (disabled by default)
- Configurable log levels (info, warning, error, critical, disabled)
OpenTelemetry Integration¶
- Export traces and metrics to AWS X-Ray, Datadog, Jaeger, or any OTLP-compatible backend
- Configurable sampling rate
- Root span per request with full correlation IDs
Token Usage Tracking¶
- Input, output, reasoning, and cached token counts in every API response
- Consistent reporting across all endpoints (chat, messages, embeddings, images, audio)
Developer Tools¶
- Swagger UI at
/docs— test endpoints directly in your browser - ReDoc at
/redoc— clean, searchable API reference - OpenAPI schema at
/openapi.json— import into Postman, generate client code
Quality of Life¶
- Model aliases — Map custom names to Bedrock IDs; Claude model names resolve automatically
- Model auto-detection — Discovers available models across all configured regions at startup
- Model list caching — Fast model listing without repeated AWS API calls
- Token estimation — Optional pre-flight token count via tiktoken (without calling Bedrock); useful for client-side budgeting and routing decisions
- Safety identifier —
safety_identifierfield in requests as an alias touserfor abuse tracking and audit trails - Zero-configuration startup — Automatic region and model detection; warnings on missing config
- Deprecated model failover — Requests to retired models silently redirect to their replacements
Deployment¶
Community vs Commercial¶
| Community | Commercial | |
|---|---|---|
| Price | Free | $0.10/container-hour - With 14-day free trial |
| License | AGPL-3.0 | AWS Marketplace SCMP |
| API compatibility | Full | Full |
| Container image | Community (GHCR) | Hardened, AWS Marketplace validated |
| Deployment | Docker / self-managed | Terraform module (ECS Fargate) - AWS Marketplace container image |
| Production infrastructure | — | Fully featured - AWS Well-Architected - Hardened |
| Commercial support | — | 1 business day |
How stdapi.ai Compares¶
All four solutions below expose an OpenAI-compatible API in front of AWS Bedrock. The comparison focuses on the AWS deployment context — LiteLLM is evaluated with AWS services as the backend provider (Bedrock, Polly, Transcribe), not as a multi-cloud proxy. Bedrock Access Gateway is the official AWS-maintained open-source sample. Bedrock Mantle is AWS's own managed OpenAI-compatible endpoint, requiring no self-hosting.
| Capability | stdapi.ai | LiteLLM (on AWS) | Bedrock Access Gateway | Bedrock Mantle |
|---|---|---|---|---|
| OpenAI Chat completions | ||||
| OpenAI Embeddings | — | |||
| Anthropic Messages API | — | — | ||
| OpenAI Responses API | 12 | — | — | |
| OpenAI Image generation | — | — | ||
| OpenAI Image editing | — | — | — | |
| OpenAI Image variations | — | — | — | |
| OpenAI TTS (speech) | 13 | — | — | |
| OpenAI STT (transcription) | — | — | — | |
| OpenAI Files & Uploads API | — | — | — | |
| OpenAI Realtime API | — | — | — | |
| Cohere Rerank API | — | — | — | |
| Bedrock Full model catalog | 1 | 10 | 2 | |
| Multimodal inputs | text · image · audio · video · docs | text · image · docs | text · image | text · image |
| Multi-region quota multiplication | 8 | — | — | |
| Bedrock Cross-region inference profiles | 14 | 14 | — | |
| Bedrock system tools | — | — | — | |
| Bedrock Guardrails | — | |||
| Bedrock Service tiers | — | |||
| Bedrock Application inference profiles | ||||
| Bedrock prompt routers | — | — | ||
| Bedrock Prompt caching & reasoning | 6 | 9 | ||
| Runs in your AWS account | — | |||
| Model auto-discovery | 1 | 7 | ||
| Deprecated model failover | — | — | — | |
| Ready-to-use deployment | — | 3 | ||
| Commercial support | — | 4 | ||
| Self-hosted | — | |||
| AWS-native focus | 5 | |||
| Multi-provider support | — | 11 | — | — |
| Open-source community | Small | Large - ~40k+ ★ | Medium - ~1k ★ | — |
| Source license | AGPL-3.0 (community) · commercial | MIT | MIT-0 | AWS service |
| Distribution & supply chain | Marketplace · GHCR | pip/PyPI | GitHub (MIT-0) | AWS-managed |
About the alternatives
- LiteLLM — widely adopted multi-cloud proxy with a large open-source community. Ideal when you need a single entry point across OpenAI, Azure, AWS, and others. AWS deployment and security features (WAF, VPC endpoints) require manual setup. Also offers a commercial Enterprise tier.
- Bedrock Access Gateway — official open-source AWS sample (MIT-0), actively maintained by AWS teams. Covers chat completions and embeddings only. No WAF, auto-scaling, monitoring, or commercial support included.
- Bedrock Mantle — AWS's own native OpenAI-compatible endpoint backed by AWS's full compliance and SLA. No self-hosting required. Supports chat completions and the Responses API; limited to a subset of models (mostly newer open-weight models — Claude 3.x/4.x, Nova, Llama, Cohere, and Stability AI image models are not available). Routes through an AWS-managed endpoint, not your private VPC. See model availability.
Ready to Get Started?¶
- Start 14-Day Free Trial — Production-ready Terraform deployment on AWS Marketplace
- Getting Started Guide — Deploy to AWS with Terraform
- Run Locally — Free Docker image for development
- API Reference — Full endpoint documentation and examples
-
Full Bedrock catalog supported; each model must be declared in config (applies to auto-discovery) ↩↩
-
Subset of Bedrock models — Claude 3.x/4.x, Nova, Llama, AI21, Cohere, and Stability AI (images) not available; supports mostly newer open-weight models (DeepSeek, Gemma, Qwen, Kimi K2, MiniMax, newer Mistral, etc.) — see AWS endpoint availability ↩
-
CDK reference sample — no WAF, auto-scaling, monitoring, or commercial support ↩
-
Covered through your existing AWS Support plan ↩
-
Generalist multi-cloud proxy covering 100+ providers; AWS-specific Bedrock features and security integrations may lag behind dedicated solutions ↩
-
Prompt caching and reasoning supported on standard routes; coverage varies by model — not all Bedrock models support prompt caching or extended thinking ↩
-
Auto-discovery limited to the single deployed region — some models are only available in specific AWS regions ↩
-
Achievable via the LiteLLM router, but requires manually declaring each model per region with explicit TPM/RPM limits — no automatic quota distribution ↩
-
Claude 3.x/4.x (which support prompt caching) not available on Mantle; reasoning available via select open-weight models (Qwen3 thinking, etc.) ↩
-
Single-region deployment — some models are only available in specific AWS regions; no cross-region catalog aggregation ↩
-
100+ providers: OpenAI, Azure OpenAI, GCP Vertex, Anthropic direct, and more — ideal when you need a single gateway across multiple clouds ↩
-
Coming in the next release ↩
-
Requires connecting Amazon Polly as the TTS backend — not included by default in a LiteLLM on AWS deployment ↩
-
Supported by specifying the cross-region inference profile ARN as the model ID — no automatic profile selection ↩↩