Speech to English API¶

Translate audio from any language to English text with AWS Transcribe + Translate or AWS Bedrock audio-capable models through an OpenAI-compatible interface.

Why Choose Speech to English?¶

Automatic Language Detection
Upload audio in any language. AWS automatically detects the source language and translates to English text.
Multiple Translation Options
Choose AWS Transcribe + Translate for traditional pipeline, or use Bedrock audio models with built-in translation capabilities.
Multiple Output Formats
Choose from text, JSON, verbose JSON with timestamps, or translated subtitle files (SRT/VTT).
Subtitle Translation
Generate translated SRT and VTT subtitle files directly with precise timing for international video content.

Quick Start: Available Endpoint¶

Endpoint	Method	What It Does	Powered By
`/v1/audio/translations`	POST	Transcribe any language and translate to English	AWS Transcribe + Translate or AWS Bedrock Audio Models

Feature Compatibility¶

Feature	Status	Notes
Input
Audio file upload		Multipart file upload
Auto language detection		Automatic source detection
Output Formats
`json`		Structured translation
`text`		Plain English text
`verbose_json`		With timestamps
`srt`		English subtitles with timing
`vtt`		English WebVTT subtitles
Translation
Translation to English		Using AWS Translate
Advanced
`prompt`		Not available
`temperature`		Not available
Usage tracking
Input audio duration		Seconds (billing unit)

Legend:

Supported — Fully compatible with OpenAI API
Unsupported — Not available in this implementation

Model Support¶

Amazon Models¶

Model	Supported Languages	Notes
amazon.transcribe	100+	Full-featured transcription with speaker diarization and subtitle generation at the cost of higher latency

Configuration Required

You must configure the AWS_S3_BUCKET or AWS_TRANSCRIBE_S3_BUCKET environment variable with a bucket in the main AWS region to use this model. This bucket is used for temporary storage during transcription processing.

Mistral Models¶

Model	Supported Languages	Notes
mistral.voxtral-mini-3b-2507	100+	Compact model for fast transcription
mistral.voxtral-small-24b-2507	100+	Larger model for enhanced accuracy

Mistral Voxtral Limitations

Mistral Voxtral models have the following restrictions when running on AWS Bedrock:

File size limit: ~2MB maximum input file size
Audio channels: Mono channel audio only (single channel)

Advanced Features¶

Amazon Transcribe Features¶

Model & Features:

Use amazon.transcribe with the same interface as OpenAI's Whisper API
Or use OpenAI model name directly: whisper-1 works out of the box (maps to amazon.transcribe)
Automatic transcription + translation pipeline in one request
Multiple output formats: text, json, verbose_json, srt, vtt
Automatic source language detection (zero configuration)
Smart Subtitle Translation : Preserves original timing using intelligent HTML span processing

OpenAI Model Compatibility

stdapi.ai includes a built-in model alias that maps the OpenAI model name to AWS Transcribe:

whisper-1 → amazon.transcribe

This alias enables seamless compatibility with OpenAI-based tools and applications without any configuration changes. You can also customize or override this alias to suit your needs.

Note: The prompt and temperature parameters are not supported to ensure consistent translation accuracy.

Try It Now¶

Translate foreign audio to English text:

curl -X POST "$BASE/v1/audio/translations" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -F file=@spanish-interview.mp3 \
  -F model=amazon.transcribe \
  -F response_format=json

Translate foreign audio to English subtitles:

curl -OJ -X POST "$BASE/v1/audio/translations" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -F file=@spanish-interview.mp3 \
  -F model=amazon.transcribe \
  -F response_format=srt

Ready to translate multilingual audio? Explore available models in the Models API.