Skip to content

Responses API

Generate model responses with AWS Bedrock foundation models through an OpenAI Responses API-compatible interface. Supports text, images, tool calling, and streaming.

Why Choose the Responses API?

  • Tool Calling
    Define function tools and get structured tool calls back. Full round-trip support with function_call_output.

  • Structured Output
    Request JSON object or JSON schema output via text.format to get machine-readable responses.

  • Streaming
    Real-time token streaming with granular events for text deltas, tool calls, and lifecycle milestones.

  • Extended Reasoning
    Enable chain-of-thought reasoning on supported models via reasoning.effort.

Quick Start: Available Endpoint

Endpoint Method What It Does Powered By
/v1/responses POST Create a model response AWS Bedrock Converse API

Feature Compatibility

Feature Status Notes
Input
Plain text (input as string) Simple string shorthand for a single user message
Structured message array Array of EasyInputMessage / InputMessage items
instructions (system prompt) Injected as a Bedrock system block
system / developer role Treated as a system instruction
Image input (input_image) HTTP URLs and base64 data URIs supported
File input (input_file) File URLs and base64 data supported
function_call_output Submit tool results as input for round-trip tool calling
Tool Calling
Function tools (type: "function") Full schema mapping to Bedrock toolSpec
tool_choice: "auto" Model selects among available tools
tool_choice: "required" Model must call at least one tool
tool_choice: "none" Prevents tool calls
Named tool_choice (force) Force a specific function to be called
parallel_tool_calls Echoed in response; not transmitted to Bedrock
Built-in tools (code_interpreter, web_search, image_generation) See OpenAI Integrated Tools
file_search tool Returns 400; no Bedrock equivalent
computer / computer_use_preview tools Returns 400; see Computer Use Not Supported
mcp tool Returns 400; MCP not supported
local_shell / shell tools Returns 400; local shell not supported
custom / namespace / tool_search / apply_patch tools Returns 400; not supported
Generation Control
max_output_tokens Maps to Bedrock maxTokens
temperature 0–2 range; mapped to Bedrock inference config
top_p 0–1 range; nucleus sampling
top_logprobs 0–20 range; token log-probability output
reasoning (effort) Configures reasoning on models that support it
metadata Forwarded to Bedrock requestMetadata
prompt_cache_key Cache prompts to reduce costs and latency
prompt_cache_retention Cache TTL: in-memory or 24h
service_tier Maps to Bedrock service tier header
truncation Returns 400; Bedrock manages context automatically
max_tool_calls Returns 400; not supported
background Returns 400; async background mode not supported
store Returns 400; all responses are stateless
stream_options Returns 400; not supported
conversation Returns 400; use previous_response_id or input
prompt (template reference) Returns 400; not supported
safety_identifier Returns 400; not supported
Output Format
text.format: "text" Plain text output
text.format: "json_object" JSON object output via Bedrock outputConfig
text.format: "json_schema" Structured JSON output with schema validation
Multi-Turn
previous_response_id Not supported; pass full conversation history in input
Streaming
stream: true SSE stream with full lifecycle events
response.created Emitted at stream start
response.in_progress Emitted after created
response.output_text.delta Text token deltas
response.output_text.done Final text for each content part
response.function_call_arguments.delta Tool call argument deltas
response.function_call_arguments.done Finalized tool call arguments
response.completed Final complete response at stream end

Legend:

  • Supported — Fully compatible with OpenAI API
  • Available on Select Models — Check your model's capabilities
  • Unsupported — Not available in this implementation

Advanced Features

System Prompt (instructions)

Use instructions to define the assistant's behavior — it is injected as a Bedrock system block.

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-micro-v1:0",
    "instructions": "You are a helpful assistant that answers in French.",
    "input": "Say hello."
  }'

Function Tool Calling

Define function tools and submit results in a round-trip conversation.

Multi-Turn Conversations

All responses are stateless. Response IDs are generated for compatibility but previous_response_id is not supported. For multi-turn conversations, pass the full message history in the input array.

Unsupported Built-In Tools

file_search, computer, computer_use_preview, mcp, local_shell, shell, custom, namespace, tool_search, and apply_patch tools are not supported. Requests that include any of these tools will receive a 400 error.

Step 1 — Request a tool call:

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-micro-v1:0",
    "input": "What'\''s the weather in Paris?",
    "tool_choice": "required",
    "tools": [
      {
        "type": "function",
        "name": "get_weather",
        "description": "Get the current weather for a city",
        "parameters": {
          "type": "object",
          "properties": {"city": {"type": "string"}},
          "required": ["city"]
        }
      }
    ]
  }'

Step 2 — Submit the tool result:

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-micro-v1:0",
    "input": [
      {
        "type": "function_call_output",
        "call_id": "<call_id from step 1>",
        "output": "{\"temperature\": \"18°C\", \"condition\": \"cloudy\"}"
      }
    ],
    "tools": [
      {
        "type": "function",
        "name": "get_weather",
        "description": "Get the current weather for a city",
        "parameters": {
          "type": "object",
          "properties": {"city": {"type": "string"}},
          "required": ["city"]
        }
      }
    ]
  }'

Streaming

Real-time token streaming with granular SSE lifecycle events.

curl -N -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-micro-v1:0",
    "input": "Tell me a short story.",
    "stream": true
  }'

The stream emits events in order: response.createdresponse.in_progressresponse.output_text.delta (repeated) → response.output_text.doneresponse.completed.

Structured JSON Output

Request machine-readable output using text.format.

JSON object:

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-micro-v1:0",
    "input": "Return the current date and day of week as JSON.",
    "text": {"format": {"type": "json_object"}}
  }'

JSON schema:

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-micro-v1:0",
    "input": "What is 2 + 2? Reply with answer and confidence.",
    "text": {
      "format": {
        "type": "json_schema",
        "name": "MathResult",
        "schema": {
          "type": "object",
          "properties": {
            "answer": {"type": "number"},
            "confidence": {"type": "number"}
          },
          "required": ["answer", "confidence"]
        }
      }
    }
  }'

Extended Reasoning

Enable chain-of-thought reasoning on supported models (e.g. Amazon Nova 2, Anthropic Claude 3.7+) via reasoning.effort.

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic.claude-3-7-sonnet-20250219-v1:0",
    "input": "Solve: if a train travels 120 km in 90 minutes, what is its speed?",
    "reasoning": {"effort": "low"},
    "max_output_tokens": 4096
  }'

Prompt Caching

Cache Creation Costs

Cache creation incurs a higher cost than regular token processing. Only use prompt caching when you expect a high cache hit ratio across multiple requests with similar prompts.

Prompt caching reduces latency and costs by caching repetitive prompt components. Set the prompt_cache_key parameter to enable:

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic.claude-3-7-sonnet-20250219-v1:0",
    "instructions": "You are a helpful assistant.",
    "input": "What is Python?",
    "prompt_cache_key": "default"
  }'

Granular Cache Control:

Use dot-separated values to cache specific components:

  • "system" — Cache system messages only
  • "messages" — Cache conversation history
  • "tools" — Cache tool/function definitions (Anthropic Claude only)
  • "system.messages" — Cache both system and messages
  • "system.tools" — Cache system and tools
  • "messages.tools" — Cache messages and tools
  • "system.messages.tools" — Cache all components
  • Any other non-empty value — Cache all components

Custom Cache Keys Not Supported

Custom cache hash keys are not supported. The parameter is used only to control which sections are cached, not as a cache identifier.

Example — Cache system and tools:

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic.claude-3-7-sonnet-20250219-v1:0",
    "instructions": "You are a data analysis assistant.",
    "input": "Analyze this dataset: ...",
    "tools": [{"type": "function", "name": "run_sql", ...}],
    "prompt_cache_key": "system.tools"
  }'

Benefits:

  • Cost Reduction: Cached tokens are billed at a lower rate than regular input tokens
  • Lower Latency: Cached prompts eliminate reprocessing time
  • Automatic Management: The API handles cache invalidation and updates

Cache Retention (TTL):

Control how long cached prompts persist using the prompt_cache_retention parameter:

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic.claude-3-7-sonnet-20250219-v1:0",
    "input": "Hello",
    "prompt_cache_key": "default",
    "prompt_cache_retention": "24h"
  }'

Valid values: in-memory (default) or 24h.

Model Support

Cache retention configuration is only available on select models. See AWS Bedrock Prompt Caching - Supported Models for details on which models support configurable TTL.

Cached token usage is reported in the response:

{
  "usage": {
    "input_tokens": 1500,
    "input_tokens_details": {
      "cached_tokens": 1200
    },
    "output_tokens": 300,
    "total_tokens": 1800
  }
}

In this example, 1,200 tokens were retrieved from cache, with only 300 tokens requiring processing.

OpenAI Integrated Tools

The Responses API supports OpenAI's built-in tool types, automatically mapped to the target model's native tools.

Amazon Nova Nova Tools

Nova models support web search and code execution as integrated tools.

Web Search (web_search, web_search_preview):

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-premier-v1:0",
    "input": "What is the current version of Python?",
    "tools": [{"type": "web_search"}]
  }'

Code Interpreter (code_interpreter):

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-2-lite-v1:0",
    "input": "Calculate the first 10 Fibonacci numbers",
    "tools": [{"type": "code_interpreter"}]
  }'

Streaming sources

action.sources (citation URLs) is only populated in non-streaming responses. In streaming mode the field is null, though all lifecycle events (web_search_call.in_progress, web_search_call.completed) are still emitted.

Region Compatibility

web_search is only available on Nova Premier in US regions. Not available on EU inference profiles.

Image Generation

The image_generation integrated tool works with all text models — Claude, Nova, and any future model. The gateway intercepts the tool, lets the LLM compose the image prompt and parameters via a synthetic function call, then generates the image against a configured Bedrock image model and returns an image_generation_call output item to the client. Intermediate function_call items are suppressed.

Configuration Required

Set the IMAGE_GENERATION_MODEL environment variable to a Bedrock image model ID (e.g. amazon.nova-canvas-v1:0). The tool definition may also specify a model field to override the default per request.

Example — Generate an image:

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-micro-v1:0",
    "input": "Generate a photorealistic image of a red panda sitting on a tree branch.",
    "tools": [{"type": "image_generation"}],
    "tool_choice": "required"
  }'

The response contains an image_generation_call output item:

{
  "output": [
    {
      "type": "image_generation_call",
      "id": "img_abc123",
      "status": "completed",
      "result": "<base64-encoded PNG>"
    }
  ]
}

You can also specify image parameters in the tool definition:

{
  "type": "image_generation",
  "size": "1024x1024",
  "quality": "high",
  "output_format": "png"
}

Computer Use Not Supported

Computer Use Not Supported

The computer and computer_use_preview integrated tools are not supported. Requests that include these tools will receive a 400 error.

Available Request Headers

This endpoint supports standard Bedrock headers for enhanced control over your requests. All headers are optional and can be combined as needed.

Content Safety (Guardrails)

Header Purpose Valid Values
X-Amzn-Bedrock-GuardrailIdentifier Guardrail ID for content filtering Your guardrail identifier
X-Amzn-Bedrock-GuardrailVersion Guardrail version Version number (e.g., 1)
X-Amzn-Bedrock-Trace Guardrail trace level disabled, enabled, enabled_full

Performance Optimization

Header Purpose Valid Values
X-Amzn-Bedrock-Service-Tier Service tier selection priority, default, flex
X-Amzn-Bedrock-PerformanceConfig-Latency Latency optimization standard, optimized

Example with all headers:

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-Amzn-Bedrock-GuardrailIdentifier: your-guardrail-id" \
  -H "X-Amzn-Bedrock-GuardrailVersion: 1" \
  -H "X-Amzn-Bedrock-Trace: enabled" \
  -H "X-Amzn-Bedrock-Service-Tier: priority" \
  -H "X-Amzn-Bedrock-PerformanceConfig-Latency: optimized" \
  -d '{
    "model": "amazon.nova-micro-v1:0",
    "input": "Hello!"
  }'

Detailed Documentation

For complete information about these headers, configuration options, and use cases, see:

Try It Now

Basic response:

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-micro-v1:0",
    "input": "Say hello world"
  }'

Streaming response:

curl -N -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-micro-v1:0",
    "input": "Write a haiku about the sea.",
    "stream": true
  }'

Multi-modal with image:

{
  "model": "amazon.nova-micro-v1:0",
  "input": [
    {
      "role": "user",
      "content": [
        {"type": "input_text", "text": "Describe this image"},
        {"type": "input_image", "image_url": "https://example.com/photo.jpg"}
      ]
    }
  ]
}

With reasoning:

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic.claude-3-7-sonnet-20250219-v1:0",
    "input": "Solve 12 × 13",
    "reasoning": {"effort": "low"},
    "max_output_tokens": 4096
  }'

Ready to build with AI? Check out the Models API to see all available foundation models!