Skip to content

Chat Completions API

This OpenAI-compatible endpoint provides access to AWS Bedrock foundation models—including Claude, Nova, and more—through a familiar interface.

Why Choose Chat Completions?

  • Multiple Models
    Access models from Anthropic, Amazon, Meta, and more through one API. Choose the best model for your task without vendor lock-in.

  • Multi-Modal
    Process text, images, videos, and documents together. Support for URLs, data URIs, and direct S3 references.

  • Built-In Safety
    AWS Bedrock Guardrails provide content filtering and safety policies.

  • AWS Scale & Reliability
    Run on AWS infrastructure with service tiers for optimized latency. Multi-region model access for availability and performance.

Quick Start: Available Endpoint

Endpoint Method What It Does Powered By
/v1/chat/completions POST Conversational AI with multi-modal support AWS Bedrock Converse API

Feature Compatibility

Feature Status Notes
Messages & Roles
Text messages Full support for all text content
Image input (image_url) HTTP, data URIs
Image input from S3 S3 URLs
Video input Supported by select models
Audio input Unsupported
Document input (file) PDF and document support varies by model
System messages Includes developer role
Tool Calling
Function calling (tools) Full OpenAI-compatible schema
Legacy function_call Backward compatibility maintained
Parallel tool calls Multiple tools in one turn
Disable Parallel tool calls Parallel tool calls are always on
Non-function tool types Only function tools supported
System tools (systemTool_*) AWS Bedrock system tools (e.g., web grounding with citations)
Generation Control
max_tokens / max_completion_tokens Output length limits
temperature Mapped to Bedrock inference params
top_p Nucleus sampling control
stop sequences Custom stop strings
frequency_penalty / presence_penalty Repetition control
seed Deterministic generation
logit_bias Not all models support biasing
top_logprobs Token probability output
top_k (From Qwen API) Candidate token set size for sampling
reasoning_effort Reasoning control (minimal/low/medium/high)
enable_thinking (From Qwen API) Enable thinking mode
thinking_budget (From Qwen API) Thinking token budget
n (multiple choices) Generate multiple responses, not supported with streaming
logprobs Log probabilities
prediction Static predicted output content
response_format Response format specification
verbosity Model verbosity
web_search_options Web search tool
prompt_cache_key Cache prompts to reduce costs and latency
Extra model-specific params Extra model-specific parameters not supported by the OpenAI API
Streaming & Output
Text Text messages
Streaming (stream: true) Server-Sent Events (SSE)
Streaming obfuscation Unsupported
Audio Synthesis from text output
response_format (JSON mode) Model-specific JSON support
reasoning_content (From Deepseek API) Text reasoning messages
annotations (URL citations) URL citations from system tools (non-streaming only)
Usage tracking
Input text tokens Billing unit
Output tokens Billing unit
Reasoning tokens Estimated
Other
Service tiers Mapped to Bedrock service tiers and latency options
store / metadata OpenAI-specific features
safety_identifier / user Logged
Bedrock Guardrails Content safety policies

Legend:

  • Supported — Fully compatible with OpenAI API
  • Available on Select Models — Check your model's capabilities
  • Partial — Supported with limitations
  • Unsupported — Not available in this implementation
  • Extra Feature — Enhanced capability beyond OpenAI API

Advanced Features

Prompt Caching

Reduce costs and improve response times by caching frequently-used prompt components across multiple requests. This feature is particularly effective for applications with consistent system prompts, tool definitions, or conversation contexts.

Supported Models:

  • Anthropic Claude: Full support for system, messages, and tools caching
  • Amazon Nova: Support for system and messages caching

Documentation

See AWS Bedrock Prompt Caching - Supported Models for the complete list of models supporting prompt caching.

Cache Creation Costs

Cache creation incurs a higher cost than regular token processing. Only use prompt caching when you expect a high cache hit ratio across multiple requests with similar prompts.

How to Use:

Set the prompt_cache_key parameter to enable caching:

curl -X POST "$BASE/v1/chat/completions" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic.claude-sonnet-4-5-20250929-v1:0",
    "prompt_cache_key": "default",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant with extensive knowledge..."
      },
      {"role": "user", "content": "What is 2 + 2?"}
    ]
  }'

Granular Cache Control:

Enable caching for specific prompt sections using dot-separated values:

  • "system" - Cache system messages only
  • "messages" - Cache conversation history
  • "tools" - Cache tool/function definitions (Anthropic Claude only)
  • "system.messages" - Cache both system and messages
  • "system.tools" - Cache system and tools
  • "messages.tools" - Cache messages and tools
  • "system.messages.tools" - Cache all components
  • Any other non-empty value - Cache all components

Custom Cache Keys Not Supported

Custom cache hash keys are not supported. The parameter is used only to control which sections are cached, not as a cache identifier.

{
  "model": "anthropic.claude-sonnet-4-5-20250929-v1:0",
  "prompt_cache_key": "system.tools",
  "messages": [...],
  "tools": [...]
}

Benefits:

  • Cost Reduction: Cached tokens are billed at a lower rate than regular input tokens
  • Lower Latency: Cached prompts eliminate reprocessing time
  • Automatic Management: The API handles cache invalidation and updates

Usage Tracking:

Cached token usage is reported in the response:

{
  "usage": {
    "prompt_tokens": 1500,
    "completion_tokens": 100,
    "total_tokens": 1600,
    "prompt_tokens_details": {
      "cached_tokens": 1200
    }
  }
}

In this example, 1,200 tokens were retrieved from cache, with only 300 tokens requiring processing.

AWS S3 S3 Image Support

Access images directly from your S3 buckets without generating pre-signed URLs or downloading files locally.

Supported Formats:

  • Images: JPEG, PNG, GIF, WebP

How to Use:

Simply reference your S3 images using the s3:// URI scheme in image_url fields:

{
  "model": "anthropic.claude-sonnet-4-5-20250929-v1:0",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Describe this image"},
        {
          "type": "image_url",
          "image_url": {"url": "s3://my-bucket/images/photo.jpg"}
        }
      ]
    }
  ]
}

IAM Permissions Required

Your API service must have IAM permissions to read from the specified S3 buckets. S3 objects must be in the same AWS region as the executed model or accessible via your IAM role. Standard S3 data transfer and request costs apply.

Benefits:

  • No pre-signed URLs - Direct S3 access without generating temporary URLs
  • Security - Images stay in your AWS account with IAM-controlled access
  • Performance - Optimized data transfer within AWS infrastructure
  • Large images - No size limitations of data URIs or base64 encoding

Available Request Headers

This endpoint supports standard Bedrock headers for enhanced control over your requests. All headers are optional and can be combined as needed.

Content Safety (Guardrails)

Header Purpose Valid Values
X-Amzn-Bedrock-GuardrailIdentifier Guardrail ID for content filtering Your guardrail identifier
X-Amzn-Bedrock-GuardrailVersion Guardrail version Version number (e.g., 1)
X-Amzn-Bedrock-Trace Guardrail trace level disabled, enabled, enabled_full

Performance Optimization

Header Purpose Valid Values
X-Amzn-Bedrock-Service-Tier Service tier selection priority, default, flex
X-Amzn-Bedrock-PerformanceConfig-Latency Latency optimization standard, optimized

Example with all headers:

curl -X POST "$BASE/v1/chat/completions" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-Amzn-Bedrock-GuardrailIdentifier: your-guardrail-id" \
  -H "X-Amzn-Bedrock-GuardrailVersion: 1" \
  -H "X-Amzn-Bedrock-Trace: enabled" \
  -H "X-Amzn-Bedrock-Service-Tier: priority" \
  -H "X-Amzn-Bedrock-PerformanceConfig-Latency: optimized" \
  -d '{
    "model": "anthropic.claude-sonnet-4-5-20250929-v1:0",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Detailed Documentation

For complete information about these headers, configuration options, and use cases, see:

AWS Bedrock System Tools

AWS Bedrock system tools are built-in capabilities that foundation models can use directly without requiring you to implement backend integrations. Access any AWS Bedrock system tool by adding the systemTool_ prefix to its name—this works for current tools and any future system tools AWS releases.

How to Use:

Add system tools to your tools array using the systemTool_ prefix followed by the tool name. System tools don't require parameter definitions—just specify the tool name and the model will handle the rest.

As AWS releases new system tools, simply use the same systemTool_ prefix pattern to access them.

Amazon Nova Amazon Nova Web Grounding

Amazon Nova Web Grounding enables models to search the web for current information, helping answer questions requiring real-time data like news, weather, product availability, or recent events. The model automatically determines when to use web grounding based on the user's query.

Usage:

curl -X POST "$BASE/v1/chat/completions" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-premier-v1:0",
    "messages": [
      {
        "role": "user",
        "content": "What are the current AWS Regions and their locations?"
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "systemTool_nova_grounding"
        }
      }
    ]
  }'

Response Format:

When using web grounding, the API response includes annotations with URL citations in non-streaming mode:

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "The AWS Regions include...",
      "annotations": [
        {
          "type": "url_citation",
          "url_citation": {
            "url": "https://aws.amazon.com/about-aws/global-infrastructure/",
            "title": "AWS Global Infrastructure"
          }
        }
      ]
    }
  }]
}

Streaming Mode

Citations are only available in non-streaming responses. The OpenAI API does not support annotations in streaming mode.

Use Cases:

  • Current Events: Get up-to-date information about news, weather, stock prices, or sports scores
  • Dynamic Data: Query information that changes frequently like AWS service availability or product prices
  • Verification: Cross-reference facts with current web sources for improved accuracy
  • Knowledge Extension: Supplement model training data with real-time information

Benefits:

  • Zero Integration: No need to implement or maintain web search APIs
  • Automatic Invocation: Models intelligently decide when to use web grounding
  • Enhanced Accuracy: Reduce hallucinations with real-time information retrieval
  • OpenAI-Compatible: Works seamlessly with standard tool calling patterns

Model and Region Compatibility

Model: Only Amazon Nova Premier (amazon.nova-premier-v1:0) supports the systemTool_nova_grounding tool.

Region: Web Grounding is only available in US regions.

Provider-Specific Parameters

Unlock advanced model capabilities by passing provider-specific parameters directly in your requests. These parameters are forwarded to AWS Bedrock and allow you to access features unique to each foundation model provider.

Documentation

See Bedrock Model Parameters for the complete list of available parameters per model.

How It Works:

Add provider-specific fields at the top level of your request body alongside standard OpenAI parameters. The API automatically forwards these to the appropriate model provider via AWS Bedrock.

Examples:

Top K Sampling:

{
  "model": "anthropic.claude-sonnet-4-5-20250929-v1:0"",
  "messages": [{"role": "user", "content": "Write a poem"}],
  "top_k": 50,
  "temperature": 0.7
}

Configuration Options:

Option 1: Per-Request

Add provider-specific parameters directly in your request body (as shown in examples above).

Option 2: Server-Wide Defaults

Configure default parameters for specific models via the DEFAULT_MODEL_PARAMS environment variable:

export DEFAULT_MODEL_PARAMS='{
  "anthropic.claude-sonnet-4-5-20250929-v1:0": {
    "anthropic_beta": ["extended-thinking-2024-12-12"]
  }
}'

Parameter Priority

Per-request parameters override server-wide defaults.

Behavior:

  • Compatible parameters: Forwarded to the model and applied
  • ⚠️ Unsupported parameters: Return HTTP 400 with an error message

Claude Anthropic Claude Features

Enable cutting-edge Claude capabilities including extended thinking and reasoning.

Beta Feature Flags

Enable experimental Claude features like extended thinking by adding the anthropic_beta array to your request:

curl -X POST "$BASE/v1/chat/completions" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic.claude-sonnet-4-5-20250929-v1:0",
    "messages": [{"role":"user","content":"Summarize the news headline."}],
    "anthropic_beta": ["Interleaved-thinking-2025-05-14"]
  }'

Server-Wide Configuration

You can also configure beta flags server-wide using the DEFAULT_MODEL_PARAMS environment variable (see Provider-Specific Parameters).

Unsupported Beta Flags

Unsupported flags that would change output return HTTP 400 errors.

Documentation

See Using Claude on AWS Bedrock for more details on Claude-specific parameters.

Reasoning Control

This API supports two different approaches to control AWS Bedrock reasoning behavior. Reasoning enables foundation models to break down complex tasks into smaller steps ("chain of thought"), improving accuracy for multi-step analysis, math problems, and complex reasoning tasks. Both approaches work with all AWS Bedrock models that support reasoning capabilities.

Option 1: OpenAI OpenAI-Style Reasoning (reasoning_effort)

Use the reasoning_effort parameter with predefined effort levels. This approach works with all AWS Bedrock models that support reasoning, providing a simple way to control reasoning depth.

Available Levels:

  • minimal - Quick responses with minimal reasoning (25% of max tokens)
  • low - Light reasoning for straightforward tasks (50% of max tokens)
  • medium - Balanced reasoning for most use cases (75% of max tokens)
  • high - Deep reasoning for complex problems (100% of max tokens)

Example:

curl -X POST "$BASE/v1/chat/completions" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic.claude-sonnet-4-5-20250929-v1:0",
    "reasoning_effort": "high",
    "messages": [{"role": "user", "content": "Solve this complex problem..."}]
  }'

Option 2: Qwen Qwen-Style Reasoning (enable_thinking + thinking_budget)

Use explicit parameters for fine-grained control over thinking mode. This approach works with all AWS Bedrock models that support reasoning, offering precise control over reasoning behavior and token budgets.

Parameters:

  • enable_thinking (boolean): Enable or disable thinking mode
    • Default: Model-specific (usually false)
    • Some models have reasoning always enabled
  • thinking_budget (integer): Maximum thinking process length in tokens
    • Only effective when enable_thinking is true
    • Passed to the model as budget_tokens
    • Default: Model's maximum chain-of-thought length

Example:

curl -X POST "$BASE/v1/chat/completions" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic.claude-sonnet-4-5-20250929-v1:0",
    "enable_thinking": true,
    "thinking_budget": 2000,
    "messages": [{"role": "user", "content": "Solve this complex problem..."}]
  }'

Using Python SDK:

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key",
    base_url="https://your-endpoint/v1"
)

# OpenAI-style reasoning (predefined effort levels)
response = client.chat.completions.create(
    model="anthropic.claude-sonnet-4-5-20250929-v1:0",
    reasoning_effort="high",
    messages=[{"role": "user", "content": "Complex problem..."}]
)

# Qwen-style reasoning (fine-grained control)
response = client.chat.completions.create(
    model="anthropic.claude-sonnet-4-5-20250929-v1:0",
    messages=[{"role": "user", "content": "Complex problem..."}],
    extra_body={
        "enable_thinking": True,
        "thinking_budget": 2000
    }
)

Reasoning Output

Models that support reasoning will include their thinking process in reasoning_content fields in the response.

DeepSeek DeepSeek Reasoning Support

DeepSeek models with reasoning capabilities are automatically handled—their chain-of-thought reasoning appears in reasoning_content fields without any special configuration, just like DeepSeek's native chat completions endpoint.

Documentation

See DeepSeek API - Chat Completions for more information about DeepSeek's reasoning capabilities.

What You Get:

  • Automatic reasoning: DeepSeek reasoning models automatically include their thinking process
  • reasoning_content field: Receive visible reasoning text in assistant messages
  • Streaming support: Get choices[].delta.reasoning_content chunks in real-time as the model thinks
  • Compatible format: Uses the same DeepSeek-compatible response format

How It Works:

  • When using DeepSeek reasoning models, the API automatically surfaces their chain-of-thought
  • Non-reasoning models simply omit the reasoning_content field
  • No special parameters needed—just use the model and reasoning appears automatically

Try It Now

Basic chat completion:

curl -X POST "$BASE/v1/chat/completions" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-micro-v1:0",
    "messages": [{"role": "user", "content": "Say hello world"}]
  }'

Streaming response:

curl -N -X POST "$BASE/v1/chat/completions" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-micro-v1:0",
    "stream": true,
    "messages": [{"role": "user", "content": "Write a haiku about the sea."}]
  }'

Multi-modal with image:

{
  "model": "amazon.nova-micro-v1:0",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Describe this image"},
        {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
      ]
    }
  ]
}

With reasoning:

curl -X POST "$BASE/v1/chat/completions" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic.claude-sonnet-4-5-20250929-v1:0",
    "reasoning_effort": "low",
    "messages": [{"role": "user", "content": "Solve 12*13"}]
  }'

Response with reasoning:

{
  "choices": [{
    "message": {
      "role": "assistant",
      "reasoning_content": "12 × 10 = 120, plus 12 × 3 = 36 → 156",
      "content": "156"
    }
  }]
}


Ready to build with AI? Check out the Models API to see all available foundation models!