Chat Completions API¶
This OpenAI-compatible endpoint provides access to AWS Bedrock foundation models—including Claude, Nova, and more—through a familiar interface.
Why Choose Chat Completions?¶
-
Multiple Models
Access models from Anthropic, Amazon, Meta, and more through one API. Choose the best model for your task without vendor lock-in. -
Multi-Modal
Process text, images, videos, and documents together. Support for URLs, data URIs, and direct S3 references. -
Built-In Safety
AWS Bedrock Guardrails provide content filtering and safety policies. -
AWS Scale & Reliability
Run on AWS infrastructure with service tiers for optimized latency. Multi-region model access for availability and performance.
Quick Start: Available Endpoint¶
| Endpoint | Method | What It Does | Powered By |
|---|---|---|---|
/v1/chat/completions |
POST | Conversational AI with multi-modal support | AWS Bedrock Converse API |
Feature Compatibility¶
| Feature | Status | Notes |
|---|---|---|
| Messages & Roles | ||
| Text messages | Full support for all text content | |
Image input (image_url) |
HTTP, data URIs | |
| Image input from S3 | S3 URLs | |
| Video input | Supported by select models | |
| Audio input | Unsupported | |
Document input (file) |
PDF and document support varies by model | |
| System messages | Includes developer role |
|
| Tool Calling | ||
Function calling (tools) |
Full OpenAI-compatible schema | |
Legacy function_call |
Backward compatibility maintained | |
| Parallel tool calls | Multiple tools in one turn | |
| Disable Parallel tool calls | Parallel tool calls are always on | |
| Non-function tool types | Only function tools supported | |
| Generation Control | ||
max_tokens / max_completion_tokens |
Output length limits | |
temperature |
Mapped to Bedrock inference params | |
top_p |
Nucleus sampling control | |
stop sequences |
Custom stop strings | |
frequency_penalty / presence_penalty |
Repetition control | |
seed |
Deterministic generation | |
logit_bias |
Not all models support biasing | |
top_logprobs |
Token probability output | |
top_k (From Qwen API) |
Candidate token set size for sampling | |
reasoning_effort |
Reasoning control (minimal/low/medium/high) | |
enable_thinking (From Qwen API) |
Enable thinking mode | |
thinking_budget (From Qwen API) |
Thinking token budget | |
n (multiple choices) |
Generate multiple responses, not supported with streaming | |
logprobs |
Log probabilities | |
prediction |
Static predicted output content | |
response_format |
Response format specification | |
verbosity |
Model verbosity | |
web_search_options |
Web search tool | |
| prompt cache | Prompt cache for similar request | |
| Extra model-specific params | Extra model-specific parameters not supported by the OpenAI API | |
| Streaming & Output | ||
| Text | Text messages | |
Streaming (stream: true) |
Server-Sent Events (SSE) | |
| Streaming obfuscation | Unsupported | |
| Audio | Synthesis from text output | |
response_format (JSON mode) |
Model-specific JSON support | |
reasoning_content (From Deepseek API) |
Text reasoning messages | |
| Usage tracking | ||
| Input text tokens | Billing unit | |
| Output tokens | Billing unit | |
| Reasoning tokens | Estimated | |
| Other | ||
| Service tiers | Mapped to Bedrock latency options | |
store / metadata |
OpenAI-specific features | |
safety_identifier / user |
Logged | |
| Bedrock Guardrails | Content safety policies |
Legend:
- Supported — Fully compatible with OpenAI API
- Available on Select Models — Check your model's capabilities
- Partial — Supported with limitations
- Unsupported — Not available in this implementation
- Extra Feature — Enhanced capability beyond OpenAI API
Advanced Features¶
S3 Image Support¶
Access images directly from your S3 buckets without generating pre-signed URLs or downloading files locally.
Supported Formats:
- Images: JPEG, PNG, GIF, WebP
How to Use:
Simply reference your S3 images using the s3:// URI scheme in image_url fields:
{
"model": "anthropic.claude-sonnet-4-5-20250929-v1:0",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image"},
{
"type": "image_url",
"image_url": {"url": "s3://my-bucket/images/photo.jpg"}
}
]
}
]
}
Requirements:
- Your API service must have IAM permissions to read from the specified S3 buckets
- S3 objects must be in the same AWS region as the executed model or accessible via your IAM role
- Standard S3 data transfer and request costs apply
Benefits:
- No pre-signed URLs - Direct S3 access without generating temporary URLs
- Security - Images stay in your AWS account with IAM-controlled access
- Performance - Optimized data transfer within AWS infrastructure
- Large images - No size limitations of data URIs or base64 encoding
AWS Bedrock Guardrails¶
Protect your applications with content filtering and safety policies using AWS Bedrock Guardrails. This implementation supports the same guardrails integration as AWS Bedrock's native OpenAI-compatible endpoint.
Documentation: AWS Bedrock OpenAI Chat Completions API - Include a guardrail in a chat completion
How to Use:
Add guardrail headers to your chat completion requests to apply your configured safety policies:
curl -X POST "$BASE/v1/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Amzn-Bedrock-GuardrailIdentifier: your-guardrail-id" \
-H "X-Amzn-Bedrock-GuardrailVersion: 1" \
-H "X-Amzn-Bedrock-Trace: ENABLED" \
-d '{
"model": "anthropic.claude-sonnet-4-5-20250929-v1:0",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Headers:
X-Amzn-Bedrock-GuardrailIdentifier(required): The ID of your configured guardrailX-Amzn-Bedrock-GuardrailVersion(required): The version number of your guardrailX-Amzn-Bedrock-Trace(optional): Set toENABLEDto enable trace logging for debugging
What Happens:
- Requests are validated against your guardrail policies before reaching the model
- Responses are filtered according to your content safety rules
- Violations are blocked and return appropriate error responses
Note: The tagSuffix parameter is not supported in this implementation.
Provider-Specific Parameters¶
Unlock advanced model capabilities by passing provider-specific parameters directly in your requests. These parameters are forwarded to AWS Bedrock and allow you to access features unique to each foundation model provider.
Documentation: Bedrock Model Parameters
How It Works:
Add provider-specific fields at the top level of your request body alongside standard OpenAI parameters. The API automatically forwards these to the appropriate model provider via AWS Bedrock.
Examples:
Top K Sampling:
{
"model": "anthropic.claude-sonnet-4-5-20250929-v1:0"",
"messages": [{"role": "user", "content": "Write a poem"}],
"top_k": 50,
"temperature": 0.7
}
Configuration Options:
Option 1: Per-Request
Add provider-specific parameters directly in your request body (as shown in examples above).
Option 2: Server-Wide Defaults
Configure default parameters for specific models via the DEFAULT_MODEL_PARAMS environment variable:
export DEFAULT_MODEL_PARAMS='{
"anthropic.claude-sonnet-4-5-20250929-v1:0": {
"anthropic_beta": ["extended-thinking-2024-12-12"]
}
}'
Note: Per-request parameters override server-wide defaults.
Behavior:
- ✅ Compatible parameters: Forwarded to the model and applied
- ⚠️ Unsupported parameters: Return HTTP 400 with an error message
Anthropic Claude Features¶
Enable cutting-edge Claude capabilities including extended thinking and reasoning.
Beta Feature Flags¶
Enable experimental Claude features like extended thinking by adding the anthropic_beta array to your request:
curl -X POST "$BASE/v1/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic.claude-sonnet-4-5-20250929-v1:0",
"messages": [{"role":"user","content":"Summarize the news headline."}],
"anthropic_beta": ["Interleaved-thinking-2025-05-14"]
}'
Note: You can also configure beta flags server-wide using the DEFAULT_MODEL_PARAMS environment variable (see Provider-Specific Parameters). Unsupported flags that would change output return HTTP 400 errors.
Documentation:
Reasoning Control¶
This API supports two different approaches to control AWS Bedrock reasoning behavior. Reasoning enables foundation models to break down complex tasks into smaller steps ("chain of thought"), improving accuracy for multi-step analysis, math problems, and complex reasoning tasks. Both approaches work with all AWS Bedrock models that support reasoning capabilities.
Option 1: OpenAI-Style Reasoning (
reasoning_effort)
Use the reasoning_effort parameter with predefined effort levels. This approach works with all AWS Bedrock models that support reasoning, providing a simple way to control reasoning depth.
Available Levels:
minimal- Quick responses with minimal reasoning (25% of max tokens)low- Light reasoning for straightforward tasks (50% of max tokens)medium- Balanced reasoning for most use cases (75% of max tokens)high- Deep reasoning for complex problems (100% of max tokens)
Example:
curl -X POST "$BASE/v1/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic.claude-sonnet-4-5-20250929-v1:0",
"reasoning_effort": "high",
"messages": [{"role": "user", "content": "Solve this complex problem..."}]
}'
Option 2: Qwen-Style Reasoning (
enable_thinking + thinking_budget)
Use explicit parameters for fine-grained control over thinking mode. This approach works with all AWS Bedrock models that support reasoning, offering precise control over reasoning behavior and token budgets.
Parameters:
enable_thinking(boolean): Enable or disable thinking mode- Default: Model-specific (usually
false) - Some models have reasoning always enabled
- Default: Model-specific (usually
thinking_budget(integer): Maximum thinking process length in tokens- Only effective when
enable_thinkingistrue - Passed to the model as
budget_tokens - Default: Model's maximum chain-of-thought length
- Only effective when
Example:
curl -X POST "$BASE/v1/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic.claude-sonnet-4-5-20250929-v1:0",
"enable_thinking": true,
"thinking_budget": 2000,
"messages": [{"role": "user", "content": "Solve this complex problem..."}]
}'
Using Python SDK:
from openai import OpenAI
client = OpenAI(
api_key="your-api-key",
base_url="https://your-endpoint/v1"
)
# OpenAI-style reasoning (predefined effort levels)
response = client.chat.completions.create(
model="anthropic.claude-sonnet-4-5-20250929-v1:0",
reasoning_effort="high",
messages=[{"role": "user", "content": "Complex problem..."}]
)
# Qwen-style reasoning (fine-grained control)
response = client.chat.completions.create(
model="anthropic.claude-sonnet-4-5-20250929-v1:0",
messages=[{"role": "user", "content": "Complex problem..."}],
extra_body={
"enable_thinking": True,
"thinking_budget": 2000
}
)
Note: Models that support reasoning will include their thinking process in reasoning_content fields in the response.
DeepSeek Reasoning Support¶
DeepSeek models with reasoning capabilities are automatically handled—their chain-of-thought reasoning appears in reasoning_content fields without any special configuration, just like DeepSeek's native chat completions endpoint.
Documentation: DeepSeek API - Chat Completions
What You Get:
- Automatic reasoning: DeepSeek reasoning models automatically include their thinking process
reasoning_contentfield: Receive visible reasoning text in assistant messages- Streaming support: Get
choices[].delta.reasoning_contentchunks in real-time as the model thinks - Compatible format: Uses the same DeepSeek-compatible response format
How It Works:
- When using DeepSeek reasoning models, the API automatically surfaces their chain-of-thought
- Non-reasoning models simply omit the
reasoning_contentfield - No special parameters needed—just use the model and reasoning appears automatically
Try It Now¶
Basic chat completion:
curl -X POST "$BASE/v1/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "amazon.nova-micro-v1:0",
"messages": [{"role": "user", "content": "Say hello world"}]
}'
Streaming response:
curl -N -X POST "$BASE/v1/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "amazon.nova-micro-v1:0",
"stream": true,
"messages": [{"role": "user", "content": "Write a haiku about the sea."}]
}'
Multi-modal with image:
{
"model": "amazon.nova-micro-v1:0",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image"},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
]
}
]
}
With reasoning:
curl -X POST "$BASE/v1/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic.claude-sonnet-4-5-20250929-v1:0",
"reasoning_effort": "low",
"messages": [{"role": "user", "content": "Solve 12*13"}]
}'
Response with reasoning:
{
"choices": [{
"message": {
"role": "assistant",
"reasoning_content": "12 × 10 = 120, plus 12 × 3 = 36 → 156",
"content": "156"
}
}]
}
Ready to build with AI? Check out the Models API to see all available foundation models!