Skip to content

Features — AI Gateway for AWS Bedrock

stdapi.ai is an AI gateway purpose-built for AWS. It brings full OpenAI and Anthropic API compatibility to AWS Bedrock and AWS AI services — so any tool, SDK, or application your team already uses connects instantly, without code changes.

  • One URL change, 80+ models — Drop in as an OpenAI or Anthropic replacement
  • Everything stays in your AWS account — No third-party routing, no data sharing
  • Enterprise compliance built in — ISO, SOC, HIPAA, GDPR, FedRAMP via AWS
  • Production in minutes — Terraform module on AWS Marketplace, 14-day free trial

How It Works

stdapi.ai sits between your applications and AWS services, translating OpenAI and Anthropic API calls into native AWS requests. Any tool or SDK that speaks either protocol connects instantly — no plugins, no custom integrations.

%%{init: {'flowchart': {'htmlLabels': true}} }%%
flowchart LR
  openwebui["<img src='../styles/logo_openwebui.svg' style='height:48px;width:auto;vertical-align:middle;' /> Open WebUI"] --> stdapi["<img src='../styles/logo.svg' style='height:64px;width:auto;vertical-align:middle;' /> stdapi.ai"]
  n8n["<img src='../styles/logo_n8n.svg' style='height:48px;width:auto;vertical-align:middle;' /> n8n"] --> stdapi
  ide["<img src='../styles/logo_vscode.svg' style='height:48px;width:auto;vertical-align:middle;' /> IDE + AI Assistant"] --> stdapi
  openai_app["<img src='../styles/logo_openai.svg' style='height:48px;width:auto;vertical-align:middle;' /> Any OpenAI App"] --> stdapi
  anthropic_app["<img src='../styles/logo_anthropic.svg' style='height:48px;width:auto;vertical-align:middle;' /> Any Anthropic App"] --> stdapi
  stdapi --> bedrock["<img src='../styles/logo_amazon_bedrock.svg' style='height:48px;width:auto;vertical-align:middle;' /> AWS Bedrock"]
  bedrock --> claude["<img src='../styles/logo_anthropic_claude.svg' style='height:36px;width:auto;vertical-align:middle;' /> Claude"]
  bedrock --> qwen["<img src='../styles/logo_qwen.svg' style='height:36px;width:auto;vertical-align:middle;' /> Qwen"]
  bedrock --> mistral["<img src='../styles/logo_mistralai.svg' style='height:36px;width:auto;vertical-align:middle;' /> Mistral"]
  bedrock --> stability["<img src='../styles/logo_stabilityai.svg' style='height:36px;width:auto;vertical-align:middle;' /> Stability AI"]
  bedrock --> more["✨ and more..."]
  stdapi --> transcribe["<img src='../styles/logo_amazon_transcribe.svg' style='height:48px;width:auto;vertical-align:middle;' /> AWS Transcribe"]
  stdapi --> polly["<img src='../styles/logo_amazon_polly.svg' style='height:48px;width:auto;vertical-align:middle;' /> AWS Polly"]
  stdapi --> s3["<img src='../styles/logo_amazon_s3.svg' style='height:48px;width:auto;vertical-align:middle;' /> Amazon S3"]

Latency overhead

The gateway adds negligible per-request processing overhead — typically a few milliseconds. End-to-end latency is dominated by Bedrock model inference time. Streaming responses are passed through immediately with no intermediate buffering.


Why stdapi.ai?

  • Complete API surface
    Most gateways cover chat completions only. stdapi.ai delivers the full OpenAI surface on AWS: chat completions, embeddings, image generation and editing, text-to-speech, speech-to-text, translation, and file storage — all through standard API calls, with no AWS-specific code in your application.

  • Your data, your account
    stdapi.ai runs entirely within your own VPC — no traffic leaves your account. AWS Bedrock never retains or trains on your prompts. The software supply chain is hardened end-to-end — distributed as a validated container image with no public package registry exposure.

  • Multiply your throughput
    Every AWS region has its own independent quota. Configure three regions and you get approximately three times your tokens-per-minute. Multi-region failover is fully automatic — clients never see a throttle error.

  • Every Bedrock capability, zero custom code
    Prompt caching, extended thinking, guardrails, service tiers, cross-region inference profiles, system tools (Nova web grounding, code interpreter), SSML for speech synthesis — every Bedrock-native feature exposed through standard OpenAI and Anthropic APIs.


API Compatibility

Your existing applications, SDKs, and tools work immediately — no plugins or client changes needed.

Supported Endpoints

OpenAI-Compatible:

Endpoint Capability AWS Backend
/v1/chat/completions Conversational AI, tool calling, multi-modal AWS Bedrock Converse API
/v1/responses Stateless conversational AI with built-in tools (coming soon) AWS Bedrock Converse API
/v1/embeddings Vector embeddings for search & RAG AWS Bedrock Embedding Models
/v1/images/generations Text-to-image generation AWS Bedrock Image Models
/v1/images/edits Image editing, inpainting & transformations AWS Bedrock Image Models
/v1/images/variations Image variations AWS Bedrock Image Models
/v1/audio/speech Text-to-speech with SSML support Amazon Polly
/v1/audio/transcriptions Speech-to-text with speaker diarization Amazon Transcribe
/v1/audio/translations Speech-to-English translation Amazon Transcribe + Amazon Translate
/v1/models Model discovery & listing AWS Bedrock
/v1/files File upload, listing, metadata, download, deletion Amazon S3
/v1/uploads Multipart upload sessions for large files Amazon S3
/available_models List models filtered by modality (text, image, audio, embedding) Internal

Anthropic-Compatible:

Endpoint Capability AWS Backend
/v1/messages Conversational AI, tool calling, multi-modal AWS Bedrock Converse API
/v1/messages/count_tokens Count tokens without sending a message AWS Bedrock CountTokens API
/v1/models Model discovery & listing AWS Bedrock
/v1/models/{model_id} Model details AWS Bedrock
/v1/files File upload, listing, metadata, download, deletion Amazon S3

Route prefix

Anthropic-compatible routes are prefixed with /anthropic by default (e.g., /anthropic/v1/messages). The prefix is configurable via ANTHROPIC_ROUTES_PREFIX.

Parameter Coverage

stdapi.ai maps as many parameters as possible to Bedrock equivalents — across all routes, not just chat:

  • Generation controlstemperature, max_tokens, top_p, top_k, stop, seed, frequency_penalty, presence_penalty, logit_bias, top_logprobs, streaming via SSE, token usage reporting
  • Reasoningreasoning_effort (minimal/low/medium/high/xhigh), enable_thinking, thinking_budget
  • Tool / function calling — Full OpenAI and Anthropic schemas, parallel tool calls, tool choice modes
  • All content types — System, developer, user, assistant, and tool roles; text, image, audio, video, and document content
  • Response formats — JSON object, JSON schema, streaming chunks, reasoning_content, annotations
  • Model-specific extras — Any parameter beyond the standard API via extra_body or top-level request fields

Bedrock & model differences

Not every parameter maps identically across all models. Check the API documentation for details.


80+ Models Across 10+ Providers

Access every model available on AWS Bedrock through a single, consistent API.

  • Claude Anthropic Claude
    Claude 4.6, Claude Sonnet, Claude Haiku — including reasoning models. Use official Anthropic model names (e.g., claude-opus-4-6) — they resolve automatically.

  • Amazon Nova Amazon Nova
    Nova Micro, Lite, Pro, Premier, Nova 2 with reasoning. Canvas for images. Multimodal embeddings. Built-in web grounding and code interpreter.

  • Meta Llama Meta Llama
    Llama 4 Scout, Maverick, and earlier Llama 3 variants.

  • Qwen Alibaba Qwen
    Qwen3 and Qwen Coder — including thinking mode.

  • DeepSeek DeepSeek
    Latest DeepSeek V3 models with automatic reasoning content surfacing.

  • Kimi Moonshot Kimi K2
    Kimi K2 with optional thinking mode.

  • Mistral Mistral AI
    Mistral, Mixtral, and Mistral Large variants.

  • Cohere Cohere
    Command models for chat; Embed v4 for multimodal embeddings.

  • Stability AI Stability AI
    Stable Diffusion 3.5, SD3 Ultra, and specialty models (upscale, style, search).

  • MiniMax MiniMax & more
    MiniMax M2.5, Writer Palmyra, AI21 Jamba, TwelveLabs Marengo video embeddings, and others.

Model Management

  • Automatic model discovery — Scans configured regions at startup; no manual model list to maintain
  • Model aliases — Map custom names to Bedrock model IDs; Claude and OpenAI names resolve automatically
  • Deprecated model failover — Requests to retired models transparently redirect to their replacements
  • Legacy model filtering — Optionally hide deprecated models from the models list

Multi-Modal Capabilities

Text & Conversational AI

  • All message roles: system, developer, user, assistant, tool
  • Multi-turn conversations with full history
  • Tool / function calling with parallel execution
  • Structured JSON output (JSON object and JSON schema modes)
  • Streaming via Server-Sent Events with real-time token delivery
  • Reasoning content blocks (thinking, reasoning_content) for supported models
  • Web search results as context (search_result content blocks)

Images

Generation — Text-to-image with:

  • Multiple output formats: PNG, JPEG, WebP with adjustable quality and compression
  • Flexible sizes and aspect ratios
  • Streaming generation with partial image previews
  • Style presets (model-specific)

Editing — Powerful inpainting and transformation:

  • Mask-based inpainting (define edit regions precisely)
  • Image-to-image transformation (style, structure conditioning)
  • Background removal, object search & replace, object recolor
  • Creative and conservative upscaling

Variations — Create alternative versions of existing images

JSON body format — Reference images via Files API file_id or URL instead of re-uploading

Audio

Text-to-Speech (Amazon Polly):

  • 60+ voices across 30+ languages
  • Multiple engine tiers: Standard, Neural, Long-Form, Generative
  • SSML support — control pronunciation, emphasis, pauses, prosody
  • Output formats: MP3, PCM, Opus, AAC, FLAC, OGG Vorbis
  • Speed control (0.25× to 4×)
  • Automatic language detection via Amazon Comprehend

Speech-to-Text (Amazon Transcribe):

  • 100+ languages
  • Speaker diarization — automatic speaker separation and labeling
  • Word-level and segment-level timestamps
  • Subtitle export: SRT and VTT formats
  • Vocabulary customization and custom language models
  • Automatic language detection

Speech Translation — Transcribe audio and translate to English in a single request

Documents & Files

  • PDF input with optional citation support (precise source references in responses)
  • Plain text and structured content blocks as context
  • File storage via the Files API — upload once, reference by ID across multiple requests
  • Multipart uploads for large files via the Uploads API (S3 native multipart)
  • File expiry with configurable TTL (1 hour – 30 days)

Video

  • Video input in chat completions for supported models (e.g., Amazon Nova)
  • S3 URLs as direct video input for multimodal embeddings

Embeddings

  • Text embeddings — single and batch processing
  • Multimodal embeddings — images, audio, video, PDF documents
  • Dimension control (model-specific reduction)
  • Float or Base64 output encoding
  • S3 URL input for large files; oversized base64 payloads auto-uploaded to S3

Purpose-Built for AWS

Multi-Region & Quota Multiplication

Configure multiple AWS regions to scale your throughput and maximize availability:

Routing Strategy Description Prompt Caching
ordered (default) Try regions in order; skip blocked ones ✓ Compatible
lowest_latency Prefer fastest measured region ✓ Compatible
round_robin Distribute evenly across regions
disabled Single region per model ✓ Compatible
  • 3 regions ≈ 3× your tokens-per-minute — each region has its own independent quota
  • Automatic failover — transparent region switching on throttle, quota, or service errors
  • Exponential backoff — doubles per consecutive error, capped at 1 hour
  • Region health tracking — per-model health status with configurable recovery delays

Resilience & Failover

Advanced Bedrock Features

Feature Description
Prompt Caching Cache system prompts, messages, and tools; granular section control; configurable TTL; cache metrics in every response
Reasoning Modes Extended thinking with effort levels (minimal → xhigh) for Claude and Nova 2; thinking_budget for token-level control
Bedrock Guardrails Content filtering and safety policies with configurable trace levels
Service Tiers Priority, default, and flex latency tiers per request
Application Inference Profiles Custom profiles for workload isolation and cost attribution
Prompt Routers Bedrock prompt routers for intelligent model selection
Cross-Region Inference Geography-pinned (US, EU, APAC) and global profiles with data residency control
System Tools (Nova) Web grounding with URL citations; code interpreter
Claude Server Tools Bash, text editor, computer use (3.5+), memory (3.7+)
Extra Model Parameters Any model-specific parameter forwarded via extra_body or top-level field

AWS AI Services Integration

Service Capability
Amazon Polly 60+ voices, 30+ languages, SSML, multiple engines and audio formats
Amazon Transcribe 100+ languages, speaker diarization, timestamps, SRT/VTT subtitles
Amazon Translate Language translation for audio translation workflows
Amazon Comprehend Automatic language detection for intelligent voice routing

Amazon S3 Integration

S3 is woven into the entire API surface — not just file storage:

  • Files API — Full CRUD at /v1/files with no artificial size limit (up to S3's ~5 TB), optional expiry, S3 Lifecycle backstop; file IDs work across both OpenAI and Anthropic endpoints
  • Multipart uploads/v1/uploads backed by S3 native multipart; stream large files without buffering
  • Direct s3:// image references — Use s3://bucket/key in chat completions and Anthropic Messages; the gateway reads from S3 via IAM role — no pre-signed URLs
  • Files API in image operations — Reference uploaded files by file_id in image edits and variations
  • Multimodal embeddings — Pass s3:// URLs directly; oversized base64 payloads auto-uploaded and invoked asynchronously
  • Regional buckets — One bucket per Bedrock region; S3 region routing is automatic
  • Transfer Acceleration — Faster downloads via generated HTTP links

Security & Compliance

Authentication

stdapi.ai supports multiple authentication strategies to fit your architecture:

Method How Best For
API Key Authorization: Bearer or X-API-Key header; stored in SSM Parameter Store or Secrets Manager (never plain text) Direct clients, SDKs
OIDC / Cognito Delegate to AWS Application Load Balancer or API Gateway Web apps, SSO
AWS IAM (SigV4) Via API Gateway with IAM authorization Internal AWS services
No authentication Open access Private VPC deployments

Authentication & Security

Security Features

Feature Description
Industry-Standard API Key Hashing API keys hashed with a cryptographic function + per-key salt; constant-time comparison prevents timing attacks; only the hash is retained in memory
SSRF Protection Blocks loopback, link-local, private network addresses, and DNS rebinding attacks
Trusted Hosts Restrict which hostnames the service responds to
CORS Controls Configurable cross-origin resource sharing policies
CSRF Protection Built-in cross-site request forgery protection
Input Validation Configurable strict mode — rejects malformed or out-of-spec requests at the gateway edge
Proxy Header Handling Secure forwarded header processing for ALB and CloudFront
TLS 1.2+ in transit All AWS service calls encrypted; the Terraform module configures ALB with TLS 1.3 and post-quantum hybrid key exchange

Commercial: Hardened Container Image AWS Marketplace

The commercial image is security-validated by AWS Marketplace and includes: read-only root filesystem, dropped Linux capabilities, minimal installed packages, and no shell. The Terraform module also configures a Customer Managed KMS key (auto-rotation enabled) for all data at rest.

Compliance & Data Sovereignty

Data never leaves your AWS account — every AWS service call is restricted to the regions you configure. All AWS services used by stdapi.ai (Bedrock, S3, Polly, Transcribe, and more) are in scope for GDPR, ISO 27001/27017/27018, SOC 1/2/3, HIPAA, FedRAMP, PCI-DSS, and CSA STAR Level 2. The commercial Terraform module adds VPC endpoints (no internet egress), Customer Managed KMS keys, and region-pinned cross-region profiles for strict data residency.

Data Sovereignty & Compliance


Works with Your Existing Tools

stdapi.ai is a drop-in replacement in hundreds of applications and frameworks. Change the API endpoint — nothing else.

  • Chat Interfaces
    Open WebUI, LobeHub, LibreChat, Chatbot UI — private ChatGPT-style experiences on AWS

  • AI Coding Assistants
    Claude Code, Continue.dev, Cline, Cursor, Windsurf, Aider — backed by Claude 4.6, Kimi K2, Qwen Coder

  • Workflow Automation
    n8n, Make, Zapier — connect AI to your business processes

  • Agent Frameworks
    OpenClaw, LangChain, LlamaIndex, CrewAI, LangGraph, AutoGPT — multi-agent systems on Bedrock

  • Team Chatbots
    Slack, Discord, Microsoft Teams — AI assistants in your team's communication tools

  • Knowledge Management
    Obsidian, Notion, Logseq — AI-powered writing assistance and search

See all use cases


Observability & Operations

Structured Logging

  • JSON logs to stdout — natively ingested by CloudWatch Logs
  • Every request logs: method, path, status, model ID, region(s) used, execution time
  • Optional: full request/response payloads, client IP (disabled by default)
  • Configurable log levels (info, warning, error, critical, disabled)

OpenTelemetry Integration

  • Export traces and metrics to AWS X-Ray, Datadog, Jaeger, or any OTLP-compatible backend
  • Configurable sampling rate
  • Root span per request with full correlation IDs

Token Usage Tracking

  • Input, output, reasoning, and cached token counts in every API response
  • Consistent reporting across all endpoints (chat, messages, embeddings, images, audio)

Developer Tools

  • Swagger UI at /docs — test endpoints directly in your browser
  • ReDoc at /redoc — clean, searchable API reference
  • OpenAPI schema at /openapi.json — import into Postman, generate client code

Quality of Life

  • Model aliases — Map custom names to Bedrock IDs; Claude model names resolve automatically
  • Model auto-detection — Discovers available models across all configured regions at startup
  • Model list caching — Fast model listing without repeated AWS API calls
  • Token estimation — Optional pre-flight token count via tiktoken (without calling Bedrock); useful for client-side budgeting and routing decisions
  • Safety identifiersafety_identifier field in requests as an alias to user for abuse tracking and audit trails
  • Zero-configuration startup — Automatic region and model detection; warnings on missing config
  • Deprecated model failover — Requests to retired models silently redirect to their replacements

Deployment

Community vs Commercial

Community Commercial
Price Free $0.10/container-hour - With 14-day free trial
License AGPL-3.0 AWS Marketplace SCMP
API compatibility Full Full
Container image Community (GHCR) Hardened, AWS Marketplace validated
Deployment Docker / self-managed Terraform module (ECS Fargate) - AWS Marketplace container image
Production infrastructure Fully featured - AWS Well-Architected - Hardened
Commercial support 1 business day

How stdapi.ai Compares

All four solutions below expose an OpenAI-compatible API in front of AWS Bedrock. The comparison focuses on the AWS deployment context — LiteLLM is evaluated with AWS services as the backend provider (Bedrock, Polly, Transcribe), not as a multi-cloud proxy. Bedrock Access Gateway is the official AWS-maintained open-source sample. Bedrock Mantle is AWS's own managed OpenAI-compatible endpoint, requiring no self-hosting.

Capability stdapi.ai LiteLLM (on AWS) Bedrock Access Gateway Bedrock Mantle
OpenAI Chat completions
OpenAI Embeddings
Anthropic Messages API
OpenAI Responses API 12
OpenAI Image generation
OpenAI Image editing
OpenAI Image variations
OpenAI TTS (speech) 13
OpenAI STT (transcription)
OpenAI Files & Uploads API
OpenAI Realtime API
Cohere Rerank API
Bedrock Full model catalog 1 10 2
Multimodal inputs text · image · audio · video · docs text · image · docs text · image text · image
Multi-region quota multiplication 8
Bedrock Cross-region inference profiles 14 14
Bedrock system tools
Bedrock Guardrails
Bedrock Service tiers
Bedrock Application inference profiles
Bedrock prompt routers
Bedrock Prompt caching & reasoning 6 9
Runs in your AWS account
Model auto-discovery 1 7
Deprecated model failover
Ready-to-use deployment 3
Commercial support 4
Self-hosted
AWS-native focus 5
Multi-provider support 11
Open-source community Small Large - ~40k+ ★ Medium - ~1k ★
Source license AGPL-3.0 (community) · commercial MIT MIT-0 AWS service
Distribution & supply chain Marketplace · GHCR pip/PyPI GitHub (MIT-0) AWS-managed

About the alternatives

  • LiteLLM — widely adopted multi-cloud proxy with a large open-source community. Ideal when you need a single entry point across OpenAI, Azure, AWS, and others. AWS deployment and security features (WAF, VPC endpoints) require manual setup. Also offers a commercial Enterprise tier.
  • Bedrock Access Gateway — official open-source AWS sample (MIT-0), actively maintained by AWS teams. Covers chat completions and embeddings only. No WAF, auto-scaling, monitoring, or commercial support included.
  • Bedrock Mantle — AWS's own native OpenAI-compatible endpoint backed by AWS's full compliance and SLA. No self-hosting required. Supports chat completions and the Responses API; limited to a subset of models (mostly newer open-weight models — Claude 3.x/4.x, Nova, Llama, Cohere, and Stability AI image models are not available). Routes through an AWS-managed endpoint, not your private VPC. See model availability.

Ready to Get Started?


  1. Full Bedrock catalog supported; each model must be declared in config (applies to auto-discovery) 

  2. Subset of Bedrock models — Claude 3.x/4.x, Nova, Llama, AI21, Cohere, and Stability AI (images) not available; supports mostly newer open-weight models (DeepSeek, Gemma, Qwen, Kimi K2, MiniMax, newer Mistral, etc.) — see AWS endpoint availability 

  3. CDK reference sample — no WAF, auto-scaling, monitoring, or commercial support 

  4. Covered through your existing AWS Support plan 

  5. Generalist multi-cloud proxy covering 100+ providers; AWS-specific Bedrock features and security integrations may lag behind dedicated solutions 

  6. Prompt caching and reasoning supported on standard routes; coverage varies by model — not all Bedrock models support prompt caching or extended thinking 

  7. Auto-discovery limited to the single deployed region — some models are only available in specific AWS regions 

  8. Achievable via the LiteLLM router, but requires manually declaring each model per region with explicit TPM/RPM limits — no automatic quota distribution 

  9. Claude 3.x/4.x (which support prompt caching) not available on Mantle; reasoning available via select open-weight models (Qwen3 thinking, etc.) 

  10. Single-region deployment — some models are only available in specific AWS regions; no cross-region catalog aggregation 

  11. 100+ providers: OpenAI, Azure OpenAI, GCP Vertex, Anthropic direct, and more — ideal when you need a single gateway across multiple clouds 

  12. Coming in the next release 

  13. Requires connecting Amazon Polly as the TTS backend — not included by default in a LiteLLM on AWS deployment 

  14. Supported by specifying the cross-region inference profile ARN as the model ID — no automatic profile selection