Responses API¶
Generate model responses with AWS Bedrock foundation models through an OpenAI Responses API-compatible interface. Supports text, images, tool calling, and streaming.
Why Choose the Responses API?¶
-
Tool Calling
Define function tools and get structured tool calls back. Full round-trip support withfunction_call_output. -
Structured Output
Request JSON object or JSON schema output viatext.formatto get machine-readable responses. -
Streaming
Real-time token streaming with granular events for text deltas, tool calls, and lifecycle milestones. -
Extended Reasoning
Enable chain-of-thought reasoning on supported models viareasoning.effort.
Quick Start: Available Endpoint¶
| Endpoint | Method | What It Does | Powered By |
|---|---|---|---|
/v1/responses |
POST |
Create a model response | AWS Bedrock Converse API |
Feature Compatibility¶
| Feature | Status | Notes |
|---|---|---|
| Input | ||
Plain text (input as string) |
Simple string shorthand for a single user message | |
| Structured message array | Array of EasyInputMessage / InputMessage items |
|
instructions (system prompt) |
Injected as a Bedrock system block | |
system / developer role |
Treated as a system instruction | |
Image input (input_image) |
HTTP URLs and base64 data URIs supported | |
File input (input_file) |
File URLs and base64 data supported | |
function_call_output |
Submit tool results as input for round-trip tool calling | |
| Tool Calling | ||
Function tools (type: "function") |
Full schema mapping to Bedrock toolSpec | |
tool_choice: "auto" |
Model selects among available tools | |
tool_choice: "required" |
Model must call at least one tool | |
tool_choice: "none" |
Prevents tool calls | |
Named tool_choice (force) |
Force a specific function to be called | |
parallel_tool_calls |
Echoed in response; not transmitted to Bedrock | |
Built-in tools (code_interpreter, web_search, image_generation) |
See OpenAI Integrated Tools | |
file_search tool |
Returns 400; no Bedrock equivalent |
|
computer / computer_use_preview tools |
Returns 400; see Computer Use Not Supported |
|
mcp tool |
Returns 400; MCP not supported |
|
local_shell / shell tools |
Returns 400; local shell not supported |
|
custom / namespace / tool_search / apply_patch tools |
Returns 400; not supported |
|
| Generation Control | ||
max_output_tokens |
Maps to Bedrock maxTokens |
|
temperature |
0–2 range; mapped to Bedrock inference config | |
top_p |
0–1 range; nucleus sampling | |
top_logprobs |
0–20 range; token log-probability output | |
reasoning (effort) |
Configures reasoning on models that support it | |
metadata |
Forwarded to Bedrock requestMetadata |
|
prompt_cache_key |
Cache prompts to reduce costs and latency | |
prompt_cache_retention |
Cache TTL: in-memory or 24h |
|
service_tier |
Maps to Bedrock service tier header | |
truncation |
Returns 400; Bedrock manages context automatically |
|
max_tool_calls |
Returns 400; not supported |
|
background |
Returns 400; async background mode not supported |
|
store |
Returns 400; all responses are stateless |
|
stream_options |
Returns 400; not supported |
|
conversation |
Returns 400; use previous_response_id or input |
|
prompt (template reference) |
Returns 400; not supported |
|
safety_identifier |
Returns 400; not supported |
|
| Output Format | ||
text.format: "text" |
Plain text output | |
text.format: "json_object" |
JSON object output via Bedrock outputConfig | |
text.format: "json_schema" |
Structured JSON output with schema validation | |
| Multi-Turn | ||
previous_response_id |
Not supported; pass full conversation history in input |
|
| Streaming | ||
stream: true |
SSE stream with full lifecycle events | |
response.created |
Emitted at stream start | |
response.in_progress |
Emitted after created | |
response.output_text.delta |
Text token deltas | |
response.output_text.done |
Final text for each content part | |
response.function_call_arguments.delta |
Tool call argument deltas | |
response.function_call_arguments.done |
Finalized tool call arguments | |
response.completed |
Final complete response at stream end |
Legend:
- Supported — Fully compatible with OpenAI API
- Available on Select Models — Check your model's capabilities
- Unsupported — Not available in this implementation
Advanced Features¶
System Prompt (instructions)¶
Use instructions to define the assistant's behavior — it is injected as a Bedrock system block.
curl -X POST "$BASE/v1/responses" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "amazon.nova-micro-v1:0",
"instructions": "You are a helpful assistant that answers in French.",
"input": "Say hello."
}'
Function Tool Calling¶
Define function tools and submit results in a round-trip conversation.
Multi-Turn Conversations
All responses are stateless. Response IDs are generated for compatibility but previous_response_id is not supported. For multi-turn conversations, pass the full message history in the input array.
Unsupported Built-In Tools
file_search, computer, computer_use_preview, mcp, local_shell, shell,
custom, namespace, tool_search, and apply_patch tools are not supported.
Requests that include any of these tools will receive a 400 error.
Step 1 — Request a tool call:
curl -X POST "$BASE/v1/responses" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "amazon.nova-micro-v1:0",
"input": "What'\''s the weather in Paris?",
"tool_choice": "required",
"tools": [
{
"type": "function",
"name": "get_weather",
"description": "Get the current weather for a city",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"]
}
}
]
}'
Step 2 — Submit the tool result:
curl -X POST "$BASE/v1/responses" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "amazon.nova-micro-v1:0",
"input": [
{
"type": "function_call_output",
"call_id": "<call_id from step 1>",
"output": "{\"temperature\": \"18°C\", \"condition\": \"cloudy\"}"
}
],
"tools": [
{
"type": "function",
"name": "get_weather",
"description": "Get the current weather for a city",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"]
}
}
]
}'
Streaming¶
Real-time token streaming with granular SSE lifecycle events.
curl -N -X POST "$BASE/v1/responses" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "amazon.nova-micro-v1:0",
"input": "Tell me a short story.",
"stream": true
}'
The stream emits events in order: response.created → response.in_progress → response.output_text.delta (repeated) → response.output_text.done → response.completed.
Structured JSON Output¶
Request machine-readable output using text.format.
JSON object:
curl -X POST "$BASE/v1/responses" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "amazon.nova-micro-v1:0",
"input": "Return the current date and day of week as JSON.",
"text": {"format": {"type": "json_object"}}
}'
JSON schema:
curl -X POST "$BASE/v1/responses" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "amazon.nova-micro-v1:0",
"input": "What is 2 + 2? Reply with answer and confidence.",
"text": {
"format": {
"type": "json_schema",
"name": "MathResult",
"schema": {
"type": "object",
"properties": {
"answer": {"type": "number"},
"confidence": {"type": "number"}
},
"required": ["answer", "confidence"]
}
}
}
}'
Extended Reasoning¶
Enable chain-of-thought reasoning on supported models (e.g. Amazon Nova 2, Anthropic Claude 3.7+) via reasoning.effort.
curl -X POST "$BASE/v1/responses" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic.claude-3-7-sonnet-20250219-v1:0",
"input": "Solve: if a train travels 120 km in 90 minutes, what is its speed?",
"reasoning": {"effort": "low"},
"max_output_tokens": 4096
}'
Prompt Caching¶
Cache Creation Costs
Cache creation incurs a higher cost than regular token processing. Only use prompt caching when you expect a high cache hit ratio across multiple requests with similar prompts.
Prompt caching reduces latency and costs by caching repetitive prompt components. Set the prompt_cache_key parameter to enable:
curl -X POST "$BASE/v1/responses" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic.claude-3-7-sonnet-20250219-v1:0",
"instructions": "You are a helpful assistant.",
"input": "What is Python?",
"prompt_cache_key": "default"
}'
Granular Cache Control:
Use dot-separated values to cache specific components:
"system"— Cache system messages only"messages"— Cache conversation history"tools"— Cache tool/function definitions (Anthropic Claude only)"system.messages"— Cache both system and messages"system.tools"— Cache system and tools"messages.tools"— Cache messages and tools"system.messages.tools"— Cache all components- Any other non-empty value — Cache all components
Custom Cache Keys Not Supported
Custom cache hash keys are not supported. The parameter is used only to control which sections are cached, not as a cache identifier.
Example — Cache system and tools:
curl -X POST "$BASE/v1/responses" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic.claude-3-7-sonnet-20250219-v1:0",
"instructions": "You are a data analysis assistant.",
"input": "Analyze this dataset: ...",
"tools": [{"type": "function", "name": "run_sql", ...}],
"prompt_cache_key": "system.tools"
}'
Benefits:
- Cost Reduction: Cached tokens are billed at a lower rate than regular input tokens
- Lower Latency: Cached prompts eliminate reprocessing time
- Automatic Management: The API handles cache invalidation and updates
Cache Retention (TTL):
Control how long cached prompts persist using the prompt_cache_retention parameter:
curl -X POST "$BASE/v1/responses" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic.claude-3-7-sonnet-20250219-v1:0",
"input": "Hello",
"prompt_cache_key": "default",
"prompt_cache_retention": "24h"
}'
Valid values: in-memory (default) or 24h.
Model Support
Cache retention configuration is only available on select models. See AWS Bedrock Prompt Caching - Supported Models for details on which models support configurable TTL.
Cached token usage is reported in the response:
{
"usage": {
"input_tokens": 1500,
"input_tokens_details": {
"cached_tokens": 1200
},
"output_tokens": 300,
"total_tokens": 1800
}
}
In this example, 1,200 tokens were retrieved from cache, with only 300 tokens requiring processing.
OpenAI Integrated Tools¶
The Responses API supports OpenAI's built-in tool types, automatically mapped to the target model's native tools.
Nova Tools¶
Nova models support web search and code execution as integrated tools.
Web Search (web_search, web_search_preview):
curl -X POST "$BASE/v1/responses" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "amazon.nova-premier-v1:0",
"input": "What is the current version of Python?",
"tools": [{"type": "web_search"}]
}'
Code Interpreter (code_interpreter):
curl -X POST "$BASE/v1/responses" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "amazon.nova-2-lite-v1:0",
"input": "Calculate the first 10 Fibonacci numbers",
"tools": [{"type": "code_interpreter"}]
}'
Streaming sources
action.sources (citation URLs) is only populated in non-streaming responses.
In streaming mode the field is null, though all lifecycle events
(web_search_call.in_progress, web_search_call.completed) are still emitted.
Region Compatibility
web_search is only available on Nova Premier in US regions. Not available on EU inference profiles.
Image Generation¶
The image_generation integrated tool works with all text models — Claude, Nova, and any future model. The gateway intercepts the tool, lets the LLM compose the image prompt and parameters via a synthetic function call, then generates the image against a configured Bedrock image model and returns an image_generation_call output item to the client. Intermediate function_call items are suppressed.
Configuration Required
Set the IMAGE_GENERATION_MODEL environment variable to a Bedrock image model ID (e.g. amazon.nova-canvas-v1:0). The tool definition may also specify a model field to override the default per request.
Example — Generate an image:
curl -X POST "$BASE/v1/responses" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "amazon.nova-micro-v1:0",
"input": "Generate a photorealistic image of a red panda sitting on a tree branch.",
"tools": [{"type": "image_generation"}],
"tool_choice": "required"
}'
The response contains an image_generation_call output item:
{
"output": [
{
"type": "image_generation_call",
"id": "img_abc123",
"status": "completed",
"result": "<base64-encoded PNG>"
}
]
}
You can also specify image parameters in the tool definition:
{
"type": "image_generation",
"size": "1024x1024",
"quality": "high",
"output_format": "png"
}
Computer Use Not Supported¶
Computer Use Not Supported
The computer and computer_use_preview integrated tools are not supported. Requests that include these tools will receive a 400 error.
Available Request Headers¶
This endpoint supports standard Bedrock headers for enhanced control over your requests. All headers are optional and can be combined as needed.
Content Safety (Guardrails)¶
| Header | Purpose | Valid Values |
|---|---|---|
X-Amzn-Bedrock-GuardrailIdentifier |
Guardrail ID for content filtering | Your guardrail identifier |
X-Amzn-Bedrock-GuardrailVersion |
Guardrail version | Version number (e.g., 1) |
X-Amzn-Bedrock-Trace |
Guardrail trace level | disabled, enabled, enabled_full |
Performance Optimization¶
| Header | Purpose | Valid Values |
|---|---|---|
X-Amzn-Bedrock-Service-Tier |
Service tier selection | priority, default, flex |
X-Amzn-Bedrock-PerformanceConfig-Latency |
Latency optimization | standard, optimized |
Example with all headers:
curl -X POST "$BASE/v1/responses" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Amzn-Bedrock-GuardrailIdentifier: your-guardrail-id" \
-H "X-Amzn-Bedrock-GuardrailVersion: 1" \
-H "X-Amzn-Bedrock-Trace: enabled" \
-H "X-Amzn-Bedrock-Service-Tier: priority" \
-H "X-Amzn-Bedrock-PerformanceConfig-Latency: optimized" \
-d '{
"model": "amazon.nova-micro-v1:0",
"input": "Hello!"
}'
Detailed Documentation
For complete information about these headers, configuration options, and use cases, see:
Try It Now¶
Basic response:
curl -X POST "$BASE/v1/responses" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "amazon.nova-micro-v1:0",
"input": "Say hello world"
}'
Streaming response:
curl -N -X POST "$BASE/v1/responses" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "amazon.nova-micro-v1:0",
"input": "Write a haiku about the sea.",
"stream": true
}'
Multi-modal with image:
{
"model": "amazon.nova-micro-v1:0",
"input": [
{
"role": "user",
"content": [
{"type": "input_text", "text": "Describe this image"},
{"type": "input_image", "image_url": "https://example.com/photo.jpg"}
]
}
]
}
With reasoning:
curl -X POST "$BASE/v1/responses" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic.claude-3-7-sonnet-20250219-v1:0",
"input": "Solve 12 × 13",
"reasoning": {"effort": "low"},
"max_output_tokens": 4096
}'
Ready to build with AI? Check out the Models API to see all available foundation models!