Skip to content

Speech to English API

Translate audio from any language to English text with AWS Transcribe + Translate or AWS Bedrock audio-capable models through an OpenAI-compatible interface.

Why Choose Speech to English?

  • Automatic Language Detection
    Upload audio in any language. AWS automatically detects the source language and translates to English text.

  • Multiple Translation Options
    Choose AWS Transcribe + Translate for traditional pipeline, or use Bedrock audio models with built-in translation capabilities.

  • Multiple Output Formats
    Choose from text, JSON, verbose JSON with timestamps, or translated subtitle files (SRT/VTT).

  • Subtitle Translation
    Generate translated SRT and VTT subtitle files directly with precise timing for international video content.

Quick Start: Available Endpoint

Endpoint Method What It Does Powered By
/v1/audio/translations POST Transcribe any language and translate to English AWS Transcribe + Translate or AWS Bedrock Audio Models

Feature Compatibility

Feature Status Notes
Input
Audio file upload Multipart file upload
Auto language detection Automatic source detection
Output Formats
json Structured translation
text Plain English text
verbose_json With timestamps
srt English subtitles with timing
vtt English WebVTT subtitles
Translation
Translation to English Using AWS Translate
Advanced
prompt Not available
temperature Not available
Usage tracking
Input audio duration Seconds (billing unit)

Legend:

  • Supported — Fully compatible with OpenAI API
  • Unsupported — Not available in this implementation

Model Support

AWS Transcribe Amazon Models

Model Supported Languages Notes
amazon.transcribe 100+ Full-featured transcription with speaker diarization and subtitle generation at the cost of higher latency

Configuration Required

You must configure the AWS_S3_BUCKET or AWS_TRANSCRIBE_S3_BUCKET environment variable with a bucket in the main AWS region to use this model. This bucket is used for temporary storage during transcription processing.

Mistral Mistral Models

Model Supported Languages Notes
mistral.voxtral-mini-3b-2507 100+ Compact model for fast transcription
mistral.voxtral-small-24b-2507 100+ Larger model for enhanced accuracy

Mistral Voxtral Limitations

Mistral Voxtral models have the following restrictions when running on AWS Bedrock:

  • File size limit: ~2MB maximum input file size
  • Audio channels: Mono channel audio only (single channel)

Advanced Features

AWS Transcribe Amazon Transcribe Features

Model & Features:

  • Use amazon.transcribe with the same interface as OpenAI's Whisper API
  • Or use OpenAI model name directly: whisper-1 works out of the box (maps to amazon.transcribe)
  • Automatic transcription + translation pipeline in one request
  • Multiple output formats: text, json, verbose_json, srt, vtt
  • Automatic source language detection (zero configuration)
  • Smart Subtitle Translation : Preserves original timing using intelligent HTML span processing

OpenAI Model Compatibility

stdapi.ai includes a built-in model alias that maps the OpenAI model name to AWS Transcribe:

  • whisper-1amazon.transcribe

This alias enables seamless compatibility with OpenAI-based tools and applications without any configuration changes. You can also customize or override this alias to suit your needs.

Note: The prompt and temperature parameters are not supported to ensure consistent translation accuracy.

Try It Now

Translate foreign audio to English text:

curl -X POST "$BASE/v1/audio/translations" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -F file=@spanish-interview.mp3 \
  -F model=amazon.transcribe \
  -F response_format=json

Translate foreign audio to English subtitles:

curl -OJ -X POST "$BASE/v1/audio/translations" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -F file=@spanish-interview.mp3 \
  -F model=amazon.transcribe \
  -F response_format=srt

Ready to translate multilingual audio? Explore available models in the Models API.