Chat Completions API¶
This OpenAI-compatible endpoint provides access to AWS Bedrock foundation models—including Claude, Nova, and more—through a familiar interface.
Why Choose Chat Completions?¶
-
Multiple Models
Access models from Anthropic, Amazon, Meta, and more through one API. Choose the best model for your task without vendor lock-in. -
Multi-Modal
Process text, images, videos, and documents together. Support for URLs, data URIs, and direct S3 references. -
Built-In Safety
AWS Bedrock Guardrails provide content filtering and safety policies. -
AWS Scale & Reliability
Run on AWS infrastructure with service tiers for optimized latency. Multi-region model access for availability and performance.
Quick Start: Available Endpoint¶
| Endpoint | Method | What It Does | Powered By |
|---|---|---|---|
/v1/chat/completions |
POST | Conversational AI with multi-modal support | AWS Bedrock Converse API |
Feature Compatibility¶
| Feature | Status | Notes |
|---|---|---|
| Messages & Roles | ||
| Text messages | Full support for all text content | |
Image input (image_url) |
HTTP, data URIs | |
| Image input from S3 | S3 URLs | |
| Video input | Supported by select models | |
| Audio input | Unsupported | |
Document input (file) |
PDF and document support varies by model | |
| System messages | Includes developer role |
|
| Tool Calling | ||
Function calling (tools) |
Full OpenAI-compatible schema | |
Legacy function_call |
Backward compatibility maintained | |
| Parallel tool calls | Multiple tools in one turn | |
| Disable Parallel tool calls | Parallel tool calls are always on | |
| Non-function tool types | Only function tools supported | |
System tools (systemTool_*) |
AWS Bedrock system tools (e.g., web grounding with citations) | |
| Generation Control | ||
max_tokens / max_completion_tokens |
Output length limits | |
temperature |
Mapped to Bedrock inference params | |
top_p |
Nucleus sampling control | |
stop sequences |
Custom stop strings | |
frequency_penalty / presence_penalty |
Repetition control | |
seed |
Deterministic generation | |
logit_bias |
Not all models support biasing | |
top_logprobs |
Token probability output | |
top_k (From Qwen API) |
Candidate token set size for sampling | |
reasoning_effort |
Reasoning control (minimal/low/medium/high) | |
enable_thinking (From Qwen API) |
Enable thinking mode | |
thinking_budget (From Qwen API) |
Thinking token budget | |
n (multiple choices) |
Generate multiple responses, not supported with streaming | |
logprobs |
Log probabilities | |
prediction |
Static predicted output content | |
response_format |
Response format specification | |
verbosity |
Model verbosity | |
web_search_options |
Web search tool | |
prompt_cache_key |
Cache prompts to reduce costs and latency | |
| Extra model-specific params | Extra model-specific parameters not supported by the OpenAI API | |
| Streaming & Output | ||
| Text | Text messages | |
Streaming (stream: true) |
Server-Sent Events (SSE) | |
| Streaming obfuscation | Unsupported | |
| Audio | Synthesis from text output | |
response_format (JSON mode) |
Model-specific JSON support | |
reasoning_content (From Deepseek API) |
Text reasoning messages | |
annotations (URL citations) |
URL citations from system tools (non-streaming only) | |
| Usage tracking | ||
| Input text tokens | Billing unit | |
| Output tokens | Billing unit | |
| Reasoning tokens | Estimated | |
| Other | ||
| Service tiers | Mapped to Bedrock service tiers and latency options | |
store / metadata |
OpenAI-specific features | |
safety_identifier / user |
Logged | |
| Bedrock Guardrails | Content safety policies |
Legend:
- Supported — Fully compatible with OpenAI API
- Available on Select Models — Check your model's capabilities
- Partial — Supported with limitations
- Unsupported — Not available in this implementation
- Extra Feature — Enhanced capability beyond OpenAI API
Advanced Features¶
Prompt Caching¶
Reduce costs and improve response times by caching frequently-used prompt components across multiple requests. This feature is particularly effective for applications with consistent system prompts, tool definitions, or conversation contexts.
Supported Models:
- Anthropic Claude: Full support for system, messages, and tools caching
- Amazon Nova: Support for system and messages caching
Documentation
See AWS Bedrock Prompt Caching - Supported Models for the complete list of models supporting prompt caching.
Cache Creation Costs
Cache creation incurs a higher cost than regular token processing. Only use prompt caching when you expect a high cache hit ratio across multiple requests with similar prompts.
How to Use:
Set the prompt_cache_key parameter to enable caching:
curl -X POST "$BASE/v1/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic.claude-sonnet-4-5-20250929-v1:0",
"prompt_cache_key": "default",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant with extensive knowledge..."
},
{"role": "user", "content": "What is 2 + 2?"}
]
}'
Granular Cache Control:
Enable caching for specific prompt sections using dot-separated values:
"system"- Cache system messages only"messages"- Cache conversation history"tools"- Cache tool/function definitions (Anthropic Claude only)"system.messages"- Cache both system and messages"system.tools"- Cache system and tools"messages.tools"- Cache messages and tools"system.messages.tools"- Cache all components- Any other non-empty value - Cache all components
Custom Cache Keys Not Supported
Custom cache hash keys are not supported. The parameter is used only to control which sections are cached, not as a cache identifier.
{
"model": "anthropic.claude-sonnet-4-5-20250929-v1:0",
"prompt_cache_key": "system.tools",
"messages": [...],
"tools": [...]
}
Benefits:
- Cost Reduction: Cached tokens are billed at a lower rate than regular input tokens
- Lower Latency: Cached prompts eliminate reprocessing time
- Automatic Management: The API handles cache invalidation and updates
Usage Tracking:
Cached token usage is reported in the response:
{
"usage": {
"prompt_tokens": 1500,
"completion_tokens": 100,
"total_tokens": 1600,
"prompt_tokens_details": {
"cached_tokens": 1200
}
}
}
In this example, 1,200 tokens were retrieved from cache, with only 300 tokens requiring processing.
S3 Image Support¶
Access images directly from your S3 buckets without generating pre-signed URLs or downloading files locally.
Supported Formats:
- Images: JPEG, PNG, GIF, WebP
How to Use:
Simply reference your S3 images using the s3:// URI scheme in image_url fields:
{
"model": "anthropic.claude-sonnet-4-5-20250929-v1:0",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image"},
{
"type": "image_url",
"image_url": {"url": "s3://my-bucket/images/photo.jpg"}
}
]
}
]
}
IAM Permissions Required
Your API service must have IAM permissions to read from the specified S3 buckets. S3 objects must be in the same AWS region as the executed model or accessible via your IAM role. Standard S3 data transfer and request costs apply.
Benefits:
- No pre-signed URLs - Direct S3 access without generating temporary URLs
- Security - Images stay in your AWS account with IAM-controlled access
- Performance - Optimized data transfer within AWS infrastructure
- Large images - No size limitations of data URIs or base64 encoding
Available Request Headers¶
This endpoint supports standard Bedrock headers for enhanced control over your requests. All headers are optional and can be combined as needed.
Content Safety (Guardrails)¶
| Header | Purpose | Valid Values |
|---|---|---|
X-Amzn-Bedrock-GuardrailIdentifier |
Guardrail ID for content filtering | Your guardrail identifier |
X-Amzn-Bedrock-GuardrailVersion |
Guardrail version | Version number (e.g., 1) |
X-Amzn-Bedrock-Trace |
Guardrail trace level | disabled, enabled, enabled_full |
Performance Optimization¶
| Header | Purpose | Valid Values |
|---|---|---|
X-Amzn-Bedrock-Service-Tier |
Service tier selection | priority, default, flex |
X-Amzn-Bedrock-PerformanceConfig-Latency |
Latency optimization | standard, optimized |
Example with all headers:
curl -X POST "$BASE/v1/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Amzn-Bedrock-GuardrailIdentifier: your-guardrail-id" \
-H "X-Amzn-Bedrock-GuardrailVersion: 1" \
-H "X-Amzn-Bedrock-Trace: enabled" \
-H "X-Amzn-Bedrock-Service-Tier: priority" \
-H "X-Amzn-Bedrock-PerformanceConfig-Latency: optimized" \
-d '{
"model": "anthropic.claude-sonnet-4-5-20250929-v1:0",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Detailed Documentation
For complete information about these headers, configuration options, and use cases, see:
AWS Bedrock System Tools¶
AWS Bedrock system tools are built-in capabilities that foundation models can use directly without requiring you to implement backend integrations. Access any AWS Bedrock system tool by adding the systemTool_ prefix to its name—this works for current tools and any future system tools AWS releases.
How to Use:
Add system tools to your tools array using the systemTool_ prefix followed by the tool name. System tools don't require parameter definitions—just specify the tool name and the model will handle the rest.
As AWS releases new system tools, simply use the same systemTool_ prefix pattern to access them.
Amazon Nova Web Grounding¶
Amazon Nova Web Grounding enables models to search the web for current information, helping answer questions requiring real-time data like news, weather, product availability, or recent events. The model automatically determines when to use web grounding based on the user's query.
Learn More
Usage:
curl -X POST "$BASE/v1/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "amazon.nova-premier-v1:0",
"messages": [
{
"role": "user",
"content": "What are the current AWS Regions and their locations?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "systemTool_nova_grounding"
}
}
]
}'
Response Format:
When using web grounding, the API response includes annotations with URL citations in non-streaming mode:
{
"choices": [{
"message": {
"role": "assistant",
"content": "The AWS Regions include...",
"annotations": [
{
"type": "url_citation",
"url_citation": {
"url": "https://aws.amazon.com/about-aws/global-infrastructure/",
"title": "AWS Global Infrastructure"
}
}
]
}
}]
}
Streaming Mode
Citations are only available in non-streaming responses. The OpenAI API does not support annotations in streaming mode.
Use Cases:
- Current Events: Get up-to-date information about news, weather, stock prices, or sports scores
- Dynamic Data: Query information that changes frequently like AWS service availability or product prices
- Verification: Cross-reference facts with current web sources for improved accuracy
- Knowledge Extension: Supplement model training data with real-time information
Benefits:
- Zero Integration: No need to implement or maintain web search APIs
- Automatic Invocation: Models intelligently decide when to use web grounding
- Enhanced Accuracy: Reduce hallucinations with real-time information retrieval
- OpenAI-Compatible: Works seamlessly with standard tool calling patterns
Model and Region Compatibility
Model: Only Amazon Nova Premier (amazon.nova-premier-v1:0) supports the systemTool_nova_grounding tool.
Region: Web Grounding is only available in US regions.
Provider-Specific Parameters¶
Unlock advanced model capabilities by passing provider-specific parameters directly in your requests. These parameters are forwarded to AWS Bedrock and allow you to access features unique to each foundation model provider.
Documentation
See Bedrock Model Parameters for the complete list of available parameters per model.
How It Works:
Add provider-specific fields at the top level of your request body alongside standard OpenAI parameters. The API automatically forwards these to the appropriate model provider via AWS Bedrock.
Examples:
Top K Sampling:
{
"model": "anthropic.claude-sonnet-4-5-20250929-v1:0"",
"messages": [{"role": "user", "content": "Write a poem"}],
"top_k": 50,
"temperature": 0.7
}
Configuration Options:
Option 1: Per-Request
Add provider-specific parameters directly in your request body (as shown in examples above).
Option 2: Server-Wide Defaults
Configure default parameters for specific models via the DEFAULT_MODEL_PARAMS environment variable:
export DEFAULT_MODEL_PARAMS='{
"anthropic.claude-sonnet-4-5-20250929-v1:0": {
"anthropic_beta": ["extended-thinking-2024-12-12"]
}
}'
Parameter Priority
Per-request parameters override server-wide defaults.
Behavior:
- ✅ Compatible parameters: Forwarded to the model and applied
- ⚠️ Unsupported parameters: Return HTTP 400 with an error message
Anthropic Claude Features¶
Enable cutting-edge Claude capabilities including extended thinking and reasoning.
Beta Feature Flags¶
Enable experimental Claude features like extended thinking by adding the anthropic_beta array to your request:
curl -X POST "$BASE/v1/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic.claude-sonnet-4-5-20250929-v1:0",
"messages": [{"role":"user","content":"Summarize the news headline."}],
"anthropic_beta": ["Interleaved-thinking-2025-05-14"]
}'
Server-Wide Configuration
You can also configure beta flags server-wide using the DEFAULT_MODEL_PARAMS environment variable (see Provider-Specific Parameters).
Unsupported Beta Flags
Unsupported flags that would change output return HTTP 400 errors.
Documentation
See Using Claude on AWS Bedrock for more details on Claude-specific parameters.
Reasoning Control¶
This API supports two different approaches to control AWS Bedrock reasoning behavior. Reasoning enables foundation models to break down complex tasks into smaller steps ("chain of thought"), improving accuracy for multi-step analysis, math problems, and complex reasoning tasks. Both approaches work with all AWS Bedrock models that support reasoning capabilities.
Option 1: OpenAI-Style Reasoning (
reasoning_effort)
Use the reasoning_effort parameter with predefined effort levels. This approach works with all AWS Bedrock models that support reasoning, providing a simple way to control reasoning depth.
Available Levels:
minimal- Quick responses with minimal reasoning (25% of max tokens)low- Light reasoning for straightforward tasks (50% of max tokens)medium- Balanced reasoning for most use cases (75% of max tokens)high- Deep reasoning for complex problems (100% of max tokens)
Example:
curl -X POST "$BASE/v1/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic.claude-sonnet-4-5-20250929-v1:0",
"reasoning_effort": "high",
"messages": [{"role": "user", "content": "Solve this complex problem..."}]
}'
Option 2: Qwen-Style Reasoning (
enable_thinking + thinking_budget)
Use explicit parameters for fine-grained control over thinking mode. This approach works with all AWS Bedrock models that support reasoning, offering precise control over reasoning behavior and token budgets.
Parameters:
enable_thinking(boolean): Enable or disable thinking mode- Default: Model-specific (usually
false) - Some models have reasoning always enabled
- Default: Model-specific (usually
thinking_budget(integer): Maximum thinking process length in tokens- Only effective when
enable_thinkingistrue - Passed to the model as
budget_tokens - Default: Model's maximum chain-of-thought length
- Only effective when
Example:
curl -X POST "$BASE/v1/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic.claude-sonnet-4-5-20250929-v1:0",
"enable_thinking": true,
"thinking_budget": 2000,
"messages": [{"role": "user", "content": "Solve this complex problem..."}]
}'
Using Python SDK:
from openai import OpenAI
client = OpenAI(
api_key="your-api-key",
base_url="https://your-endpoint/v1"
)
# OpenAI-style reasoning (predefined effort levels)
response = client.chat.completions.create(
model="anthropic.claude-sonnet-4-5-20250929-v1:0",
reasoning_effort="high",
messages=[{"role": "user", "content": "Complex problem..."}]
)
# Qwen-style reasoning (fine-grained control)
response = client.chat.completions.create(
model="anthropic.claude-sonnet-4-5-20250929-v1:0",
messages=[{"role": "user", "content": "Complex problem..."}],
extra_body={
"enable_thinking": True,
"thinking_budget": 2000
}
)
Reasoning Output
Models that support reasoning will include their thinking process in reasoning_content fields in the response.
DeepSeek Reasoning Support¶
DeepSeek models with reasoning capabilities are automatically handled—their chain-of-thought reasoning appears in reasoning_content fields without any special configuration, just like DeepSeek's native chat completions endpoint.
Documentation
See DeepSeek API - Chat Completions for more information about DeepSeek's reasoning capabilities.
What You Get:
- Automatic reasoning: DeepSeek reasoning models automatically include their thinking process
reasoning_contentfield: Receive visible reasoning text in assistant messages- Streaming support: Get
choices[].delta.reasoning_contentchunks in real-time as the model thinks - Compatible format: Uses the same DeepSeek-compatible response format
How It Works:
- When using DeepSeek reasoning models, the API automatically surfaces their chain-of-thought
- Non-reasoning models simply omit the
reasoning_contentfield - No special parameters needed—just use the model and reasoning appears automatically
Try It Now¶
Basic chat completion:
curl -X POST "$BASE/v1/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "amazon.nova-micro-v1:0",
"messages": [{"role": "user", "content": "Say hello world"}]
}'
Streaming response:
curl -N -X POST "$BASE/v1/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "amazon.nova-micro-v1:0",
"stream": true,
"messages": [{"role": "user", "content": "Write a haiku about the sea."}]
}'
Multi-modal with image:
{
"model": "amazon.nova-micro-v1:0",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image"},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
]
}
]
}
With reasoning:
curl -X POST "$BASE/v1/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic.claude-sonnet-4-5-20250929-v1:0",
"reasoning_effort": "low",
"messages": [{"role": "user", "content": "Solve 12*13"}]
}'
Response with reasoning:
{
"choices": [{
"message": {
"role": "assistant",
"reasoning_content": "12 × 10 = 120, plus 12 × 3 = 36 → 156",
"content": "156"
}
}]
}
Ready to build with AI? Check out the Models API to see all available foundation models!