Responses API¶

Generate model responses with AWS Bedrock foundation models through an OpenAI Responses API-compatible interface. Supports text, images, tool calling, and streaming.

Why Choose the Responses API?¶

Tool Calling
Define function tools and get structured tool calls back. Full round-trip support with function_call_output.
Structured Output
Request JSON object or JSON schema output via text.format to get machine-readable responses.
Streaming
Real-time token streaming with granular events for text deltas, tool calls, and lifecycle milestones.
Extended Reasoning
Enable chain-of-thought reasoning on supported models via reasoning.effort.

Quick Start: Available Endpoint¶

Endpoint	Method	What It Does	Powered By	MCP Tool
`/v1/responses`	`POST`	Create a model response	AWS Bedrock Converse API	`openai_response`
`/v1/responses/input_tokens`	`POST`	Count input tokens without generating a response	AWS Bedrock CountTokens API	`openai_response_input_tokens`

Feature Compatibility¶

Feature	Status	Notes
Input
Plain text (`input` as string)		Simple string shorthand for a single user message
Structured message array		Array of `EasyInputMessage` / `InputMessage` items
`instructions` (system prompt)		Injected as a Bedrock system block
`system` / `developer` role		Treated as a system instruction
Image input (`input_image`)		HTTP URLs and base64 data URIs supported
File input (`input_file`)		File URLs and base64 data supported
`function_call_output`		Submit tool results as input for round-trip tool calling
Tool Calling
Function tools (`type: "function"`)		Full schema mapping to Bedrock toolSpec
`tool_choice: "auto"`		Model selects among available tools
`tool_choice: "required"`		Model must call at least one tool
`tool_choice: "none"`		Prevents tool calls
Named `tool_choice` (force)		Force a specific function to be called
`parallel_tool_calls`		Echoed in response; not transmitted to Bedrock
Built-in tools (`code_interpreter`, `web_search`, `image_generation`)		See OpenAI Integrated Tools
`file_search` tool		Returns `400`; no Bedrock equivalent
`computer` / `computer_use_preview` tools		Returns `400`; see Computer Use Not Supported
`mcp` tool		Returns `400`; MCP not supported
`local_shell` / `shell` tools		Returns `400`; local shell not supported
`custom` / `namespace` / `tool_search` / `apply_patch` tools		Returns `400`; not supported
Generation Control
`max_output_tokens`		Maps to Bedrock `maxTokens`
`temperature`		0–2 range; mapped to Bedrock inference config
`top_p`		0–1 range; nucleus sampling
`top_logprobs`		0–20 range; token log-probability output
`reasoning` (effort)		Configures reasoning on models that support it
`metadata`		Forwarded to Bedrock `requestMetadata`
`prompt_cache_key`		Cache prompts to reduce costs and latency
`prompt_cache_retention`		Cache TTL: `in-memory`, `24h`, `1h`, or `5m`
`service_tier`		Maps to Bedrock service tier header
`truncation`		Returns `400`; Bedrock manages context automatically
`max_tool_calls`		Returns `400`; not supported
`background`		Returns `400`; async background mode not supported
`store`		Returns `400`; all responses are stateless
`stream_options`		Returns `400`; not supported
`conversation`		Returns `400`; use `previous_response_id` or `input`
`prompt` (template reference)		Returns `400`; not supported
`safety_identifier`		Returns `400`; not supported
Output Format
`text.format: "text"`		Plain text output
`text.format: "json_object"`		JSON object output via Bedrock outputConfig
`text.format: "json_schema"`		Structured JSON output with schema validation
Multi-Turn
`previous_response_id`		Not supported; pass full conversation history in `input`
Streaming
`stream: true`		SSE stream with full lifecycle events
`response.created`		Emitted at stream start
`response.in_progress`		Emitted after created
`response.output_text.delta`		Text token deltas
`response.output_text.done`		Final text for each content part
`response.function_call_arguments.delta`		Tool call argument deltas
`response.function_call_arguments.done`		Finalized tool call arguments
`response.completed`		Final complete response at stream end

Legend:

Supported — Fully compatible with OpenAI API
Available on Select Models — Check your model's capabilities
Unsupported — Not available in this implementation

Advanced Features¶

System Prompt (`instructions`)¶

Use instructions to define the assistant's behavior — it is injected as a Bedrock system block.

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-micro-v1:0",
    "instructions": "You are a helpful assistant that answers in French.",
    "input": "Say hello."
  }'

Function Tool Calling¶

Define function tools and submit results in a round-trip conversation.

Multi-Turn Conversations

All responses are stateless. Response IDs are generated for compatibility but previous_response_id is not supported. For multi-turn conversations, pass the full message history in the input array.

Unsupported Built-In Tools

file_search, computer, computer_use_preview, mcp, local_shell, shell, custom, namespace, tool_search, and apply_patch tools are not supported. Requests that include any of these tools will receive a 400 error.

Step 1 — Request a tool call:

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-micro-v1:0",
    "input": "What'\''s the weather in Paris?",
    "tool_choice": "required",
    "tools": [
      {
        "type": "function",
        "name": "get_weather",
        "description": "Get the current weather for a city",
        "parameters": {
          "type": "object",
          "properties": {"city": {"type": "string"}},
          "required": ["city"]
        }
      }
    ]
  }'

Step 2 — Submit the tool result:

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-micro-v1:0",
    "input": [
      {
        "type": "function_call_output",
        "call_id": "<call_id from step 1>",
        "output": "{\"temperature\": \"18°C\", \"condition\": \"cloudy\"}"
      }
    ],
    "tools": [
      {
        "type": "function",
        "name": "get_weather",
        "description": "Get the current weather for a city",
        "parameters": {
          "type": "object",
          "properties": {"city": {"type": "string"}},
          "required": ["city"]
        }
      }
    ]
  }'

Streaming¶

Real-time token streaming with granular SSE lifecycle events.

curl -N -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-micro-v1:0",
    "input": "Tell me a short story.",
    "stream": true
  }'

The stream emits events in order: response.created → response.in_progress → response.output_text.delta (repeated) → response.output_text.done → response.completed.

Structured JSON Output¶

Request machine-readable output using text.format.

JSON object:

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-micro-v1:0",
    "input": "Return the current date and day of week as JSON.",
    "text": {"format": {"type": "json_object"}}
  }'

JSON schema:

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-micro-v1:0",
    "input": "What is 2 + 2? Reply with answer and confidence.",
    "text": {
      "format": {
        "type": "json_schema",
        "name": "MathResult",
        "schema": {
          "type": "object",
          "properties": {
            "answer": {"type": "number"},
            "confidence": {"type": "number"}
          },
          "required": ["answer", "confidence"]
        }
      }
    }
  }'

Extended Reasoning¶

Enable chain-of-thought reasoning on supported models (e.g. Amazon Nova 2, Anthropic Claude 3.7+) via reasoning.effort.

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic.claude-sonnet-5",
    "input": "Solve: if a train travels 120 km in 90 minutes, what is its speed?",
    "reasoning": {"effort": "low"},
    "max_output_tokens": 4096
  }'

Prompt Caching¶

Cache Creation Costs

Cache creation incurs a higher cost than regular token processing. Only use prompt caching when you expect a high cache hit ratio across multiple requests with similar prompts.

Prompt caching reduces latency and costs by caching repetitive prompt components. Set the prompt_cache_key parameter to enable:

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic.claude-sonnet-5",
    "instructions": "You are a helpful assistant.",
    "input": "What is Python?",
    "prompt_cache_key": "default"
  }'

Granular Cache Control:

Use dot-separated values to cache specific components:

"system" — Cache system messages only
"messages" — Cache conversation history
"tools" — Cache tool/function definitions (Anthropic Claude only)
"system.messages" — Cache both system and messages
"system.tools" — Cache system and tools
"messages.tools" — Cache messages and tools
"system.messages.tools" — Cache all components
Any other non-empty value — Cache all components

Custom Cache Keys Not Supported

Custom cache hash keys are not supported. The parameter is used only to control which sections are cached, not as a cache identifier.

Example — Cache system and tools:

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic.claude-sonnet-5",
    "instructions": "You are a data analysis assistant.",
    "input": "Analyze this dataset: ...",
    "tools": [{"type": "function", "name": "run_sql", ...}],
    "prompt_cache_key": "system.tools"
  }'

Benefits:

Cost Reduction: Cached tokens are billed at a lower rate than regular input tokens
Lower Latency: Cached prompts eliminate reprocessing time
Automatic Management: The API handles cache invalidation and updates

Cache Retention (TTL):

Control how long cached prompts persist using the prompt_cache_retention parameter:

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic.claude-sonnet-5",
    "input": "Hello",
    "prompt_cache_key": "default",
    "prompt_cache_retention": "24h"
  }'

Valid values: in-memory (default), 24h, 1h, or 5m. The 1h and 5m values are AWS Bedrock-specific. On AWS Bedrock, in-memory maps to 5 minutes and 24h maps to 1 hour.

Model Support

Cache retention configuration is only available on select models. See AWS Bedrock Prompt Caching - Supported Models for details on which models support configurable TTL.

Cached token usage is reported in the response:

{
  "usage": {
    "input_tokens": 1500,
    "input_tokens_details": {
      "cached_tokens": 1200
    },
    "output_tokens": 300,
    "total_tokens": 1800
  }
}

In this example, 1,200 tokens were retrieved from cache, with only 300 tokens requiring processing.

OpenAI Integrated Tools¶

The Responses API supports OpenAI's built-in tool types, automatically mapped to the target model's native tools.

Nova Tools¶

Nova models support web search and code execution as integrated tools.

Web Search (web_search, web_search_preview):

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-premier-v1:0",
    "input": "What is the current version of Python?",
    "tools": [{"type": "web_search"}]
  }'

Code Interpreter (code_interpreter):

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-2-lite-v1:0",
    "input": "Calculate the first 10 Fibonacci numbers",
    "tools": [{"type": "code_interpreter"}]
  }'

Streaming sources

action.sources (citation URLs) is only populated in non-streaming responses. In streaming mode the field is null, though all lifecycle events (web_search_call.in_progress, web_search_call.completed) are still emitted.

Region Compatibility

web_search is only available on Nova Premier in US regions. Not available on EU inference profiles.

Image Generation¶

The image_generation integrated tool works with all text models — Claude, Nova, and any future model. The gateway intercepts the tool, lets the LLM compose the image prompt and parameters via a synthetic function call, then generates the image against a configured Bedrock image model and returns an image_generation_call output item to the client. Intermediate function_call items are suppressed.

Configuration Required

Set the IMAGE_GENERATION_MODEL environment variable to a Bedrock image model ID (e.g. amazon.nova-canvas-v1:0). The tool definition may also specify a model field to override the default per request.

Example — Generate an image:

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-micro-v1:0",
    "input": "Generate a photorealistic image of a red panda sitting on a tree branch.",
    "tools": [{"type": "image_generation"}],
    "tool_choice": "required"
  }'

The response contains an image_generation_call output item:

{
  "output": [
    {
      "type": "image_generation_call",
      "id": "img_abc123",
      "status": "completed",
      "result": "<base64-encoded PNG>"
    }
  ]
}

You can also specify image parameters in the tool definition:

{
  "type": "image_generation",
  "size": "1024x1024",
  "quality": "high",
  "output_format": "png"
}

Computer Use Not Supported¶

Computer Use Not Supported

The computer and computer_use_preview integrated tools are not supported. Requests that include these tools will receive a 400 error.

Available Request Headers¶

This endpoint supports standard Bedrock headers for enhanced control over your requests. All headers are optional and can be combined as needed.

Content Safety (Guardrails)¶

Header	Purpose	Valid Values
`X-Amzn-Bedrock-GuardrailIdentifier`	Guardrail ID for content filtering	Your guardrail identifier
`X-Amzn-Bedrock-GuardrailVersion`	Guardrail version	Version number (e.g., `1`)
`X-Amzn-Bedrock-Trace`	Guardrail trace level	`disabled`, `enabled`, `enabled_full`

Performance Optimization¶

Header	Purpose	Valid Values
`X-Amzn-Bedrock-Service-Tier`	Service tier selection	`priority`, `default`, `flex`
`X-Amzn-Bedrock-PerformanceConfig-Latency`	Latency optimization	`standard`, `optimized`

Example with all headers:

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-Amzn-Bedrock-GuardrailIdentifier: your-guardrail-id" \
  -H "X-Amzn-Bedrock-GuardrailVersion: 1" \
  -H "X-Amzn-Bedrock-Trace: enabled" \
  -H "X-Amzn-Bedrock-Service-Tier: priority" \
  -H "X-Amzn-Bedrock-PerformanceConfig-Latency: optimized" \
  -d '{
    "model": "amazon.nova-micro-v1:0",
    "input": "Hello!"
  }'

Detailed Documentation

For complete information about these headers, configuration options, and use cases, see:

Model-specific features¶

TwelveLabs Pegasus¶

twelvelabs.pegasus-1-2-v1:0 is a video-understanding model. Because Pegasus accepts exactly one video and one text prompt per call, this API adapts the conversation automatically:

The latest video found anywhere in the conversation (any role, any position) is forwarded as the video input.
The latest contiguous run of user text (back to the previous assistant or tool turn) is concatenated and forwarded as the text prompt.
temperature and max_output_tokens are forwarded.
text.format: json_schema is forwarded as Pegasus's structured output.

Silently ignored (no error): system prompt, tools, top_p, stop sequences, and prompt caching.

Upstream format limitation: The OpenAI Responses API has no input_video content type in its stable spec. To stay fully compatible with standard OpenAI clients, pass the video as an input_image content item — the server detects the video MIME type automatically and routes it to Pegasus correctly.

Video input formats: data:video/mp4;base64,…, https://…, s3://bucket/key, or file-id:…. Videos above 18.75 MB are automatically uploaded to S3.

curl https://api.example.com/v1/responses \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "twelvelabs.pegasus-1-2-v1:0",
    "input": [
      {
        "type": "message",
        "role": "user",
        "content": [
          {"type": "input_image", "image_url": "s3://my-bucket/video.mp4"},
          {"type": "input_text", "text": "Describe what happens in this video."}
        ]
      }
    ]
  }'

Try It Now¶

Basic response:

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-micro-v1:0",
    "input": "Say hello world"
  }'

Streaming response:

curl -N -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-micro-v1:0",
    "input": "Write a haiku about the sea.",
    "stream": true
  }'

Multi-modal with image:

{
  "model": "amazon.nova-micro-v1:0",
  "input": [
    {
      "role": "user",
      "content": [
        {"type": "input_text", "text": "Describe this image"},
        {"type": "input_image", "image_url": "https://example.com/photo.jpg"}
      ]
    }
  ]
}

With reasoning:

curl -X POST "$BASE/v1/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic.claude-sonnet-5",
    "input": "Solve 12 × 13",
    "reasoning": {"effort": "low"},
    "max_output_tokens": 4096
  }'

Input Token Counting¶

Count input tokens without generating a response. Useful for estimating costs or checking context-window fit before making a full response call.

Basic usage:

curl -X POST "$BASE/v1/responses/input_tokens" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-micro-v1:0",
    "input": "Hello, how are you?"
  }'

Response:

{
  "object": "response.input_tokens",
  "input_tokens": 142
}

With instructions and tools:

curl -X POST "$BASE/v1/responses/input_tokens" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon.nova-micro-v1:0",
    "input": "What is the weather?",
    "instructions": "You are a helpful assistant.",
    "tools": [{"type": "function", "name": "get_weather", "description": "Get weather for a location", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"]}}]
  }'

Limitations

The previous_response_id and conversation parameters are not supported for token counting.

Ready to build with AI? Check out the Models API to see all available foundation models!