Speech to English API¶
Upload audio in any language and get English transcriptions for international content, customer support, or multilingual applications.
Why Choose Speech to English?¶
-
Automatic Language Detection
Upload audio in any language. AWS automatically detects and translates to English. -
Multiple Output Formats
Choose from text, JSON, verbose JSON with timestamps, or translated subtitle files (SRT/VTT).
Quick Start: Available Endpoint¶
| Endpoint | Method | What It Does | Powered By |
|---|---|---|---|
/v1/audio/translations |
POST | Transcribe any language and translate to English | AWS Transcribe + AWS Translate |
Feature Compatibility¶
| Feature | Status | Notes |
|---|---|---|
| Input | ||
| Audio file upload | Multipart file upload | |
| Auto language detection | Automatic source detection | |
| Output Formats | ||
json |
Structured translation | |
text |
Plain English text | |
verbose_json |
With timestamps | |
srt |
English subtitles with timing | |
vtt |
English WebVTT subtitles | |
| Translation | ||
| Translation to English | Using AWS Translate | |
| Advanced | ||
prompt |
Not available | |
temperature |
Not available | |
| Usage tracking | ||
| Input audio duration | Seconds (billing unit) |
Legend:
- Supported — Fully compatible with OpenAI API
- Unsupported — Not available in this implementation
Advanced Features¶
OpenAI-Compatible with AWS Power¶
Model & Features:
- Use
amazon.transcribe(instead ofwhisper-1) with the same interface - Automatic transcription + translation pipeline in one request
- Multiple output formats:
text,json,verbose_json,srt,vtt - Automatic source language detection (zero configuration)
- Smart Subtitle Translation : Preserves original timing using intelligent HTML span processing
Note: The prompt and temperature parameters are not supported to ensure
consistent translation accuracy.
Configuration Required
You must configure the AWS_S3_BUCKET or AWS_TRANSCRIBE_S3_BUCKET environment
variable with a bucket in the main AWS region to use this endpoint. This bucket is used
for temporary storage during transcription processing.
Try It Now¶
Translate foreign audio to English text:
curl -X POST "$BASE/v1/audio/translations" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-F file=@spanish-interview.mp3 \
-F model=amazon.transcribe \
-F response_format=json
Translate foreign audio to English subtitles:
curl -OJ -X POST "$BASE/v1/audio/translations" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-F file=@spanish-interview.mp3 \
-F model=amazon.transcribe \
-F response_format=srt
Ready to translate multilingual audio? Explore available models in the Models API.