Speech to English API¶
Translate audio from any language to English text with AWS Transcribe + Translate or AWS Bedrock audio-capable models through an OpenAI-compatible interface.
Why Choose Speech to English?¶
-
Automatic Language Detection
Upload audio in any language. AWS automatically detects the source language and translates to English text. -
Multiple Translation Options
Choose AWS Transcribe + Translate for traditional pipeline, or use Bedrock audio models with built-in translation capabilities. -
Multiple Output Formats
Choose from text, JSON, verbose JSON with timestamps, or translated subtitle files (SRT/VTT). -
Subtitle Translation
Generate translated SRT and VTT subtitle files directly with precise timing for international video content.
Quick Start: Available Endpoint¶
| Endpoint | Method | What It Does | Powered By |
|---|---|---|---|
/v1/audio/translations |
POST | Transcribe any language and translate to English | AWS Transcribe + Translate or AWS Bedrock Audio Models |
Feature Compatibility¶
| Feature | Status | Notes |
|---|---|---|
| Input | ||
| Audio file upload | Multipart file upload | |
| Auto language detection | Automatic source detection | |
| Output Formats | ||
json |
Structured translation | |
text |
Plain English text | |
verbose_json |
With timestamps | |
srt |
English subtitles with timing | |
vtt |
English WebVTT subtitles | |
| Translation | ||
| Translation to English | Using AWS Translate | |
| Advanced | ||
prompt |
Not available | |
temperature |
Not available | |
| Usage tracking | ||
| Input audio duration | Seconds (billing unit) |
Legend:
- Supported — Fully compatible with OpenAI API
- Unsupported — Not available in this implementation
Model Support¶
Amazon Models¶
| Model | Supported Languages | Notes |
|---|---|---|
| amazon.transcribe | 100+ | Full-featured transcription with speaker diarization and subtitle generation at the cost of higher latency |
Configuration Required
You must configure the AWS_S3_BUCKET or AWS_TRANSCRIBE_S3_BUCKET environment variable with a bucket in the main AWS region to use this model. This bucket is used for temporary storage during transcription processing.
Mistral Models¶
| Model | Supported Languages | Notes |
|---|---|---|
| mistral.voxtral-mini-3b-2507 | 100+ | Compact model for fast transcription |
| mistral.voxtral-small-24b-2507 | 100+ | Larger model for enhanced accuracy |
Mistral Voxtral Limitations
Mistral Voxtral models have the following restrictions when running on AWS Bedrock:
- File size limit: ~2MB maximum input file size
- Audio channels: Mono channel audio only (single channel)
Advanced Features¶
Amazon Transcribe Features¶
Model & Features:
- Use
amazon.transcribewith the same interface as OpenAI's Whisper API - Or use OpenAI model name directly:
whisper-1works out of the box (maps toamazon.transcribe) - Automatic transcription + translation pipeline in one request
- Multiple output formats:
text,json,verbose_json,srt,vtt - Automatic source language detection (zero configuration)
- Smart Subtitle Translation : Preserves original timing using intelligent HTML span processing
OpenAI Model Compatibility
stdapi.ai includes a built-in model alias that maps the OpenAI model name to AWS Transcribe:
whisper-1→amazon.transcribe
This alias enables seamless compatibility with OpenAI-based tools and applications without any configuration changes. You can also customize or override this alias to suit your needs.
Note: The prompt and temperature parameters are not supported to ensure
consistent translation accuracy.
Try It Now¶
Translate foreign audio to English text:
curl -X POST "$BASE/v1/audio/translations" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-F file=@spanish-interview.mp3 \
-F model=amazon.transcribe \
-F response_format=json
Translate foreign audio to English subtitles:
curl -OJ -X POST "$BASE/v1/audio/translations" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-F file=@spanish-interview.mp3 \
-F model=amazon.transcribe \
-F response_format=srt
Ready to translate multilingual audio? Explore available models in the Models API.