Skip to content

Speech to Text API

Convert spoken words into text. For voice assistants, meeting transcription, or accessibility features using AWS Transcribe.

Why Choose Speech to Text?

  • 100+ Languages
    Transcribe audio in any language with automatic detection or manual specification for global applications and multilingual content.

  • Real-Time or Batch
    Stream transcriptions in real-time via SSE or process files efficiently.

  • Subtitle Generation
    Generate SRT and VTT subtitle files directly from AWS Transcribe with precise timing.

  • Word-Level Timestamps
    Get word-level or segment-level timestamps with verbose_json for video editing, searchable transcripts, and accessibility features.

Quick Start: Available Endpoint

Endpoint Method What It Does Powered By
/v1/audio/transcriptions POST Convert spoken audio to written text AWS Transcribe

Feature Compatibility

Feature Status Notes
Input
Audio file upload Multipart file upload
Output Formats
json Structured transcription
text Plain text output
verbose_json With timestamps and details
srt Subtitle format with timing
vtt WebVTT subtitle format
Language
Language specification ISO-639-1 language codes
Auto language detection Automatic identification
Streaming
SSE streaming Event-based streaming
Advanced
Timestamp granularity Word or segment level
chunking_strategy Only auto is supported
temperature Not available
prompt Not available
logprobs Not available
Usage tracking
Input audio duration Seconds (billing unit)
Output text tokens Estimated token count for reference

Legend:

  • Supported — Fully compatible with OpenAI API
  • Partial — Supported with limitations
  • Unsupported — Not available in this implementation

Advanced Features

AWS Transcribe OpenAI-Compatible with AWS Power

Model & Features:

  • Use amazon.transcribe (instead of whisper-1) with the same interface
  • Auto-detect language or specify it for faster processing
  • Word-level or segment-level timestamps with verbose_json
  • Native Subtitles : SRT/VTT files generated directly by AWS Transcribe with precise timing

Note: The prompt, temperature, and chunking_strategy parameters are not supported to ensure consistent transcription accuracy.

Performance Tips: Optimize Speed & Cost

  • Specify the language if you know it—skips auto-detection for faster processing and lower AWS costs

Configuration Required

You must configure the AWS_S3_BUCKET or AWS_TRANSCRIBE_S3_BUCKET environment variable with a bucket in the main AWS region to use this endpoint. This bucket is used for temporary storage during transcription processing.

Try It Now

Transcribe audio to JSON:

curl -X POST "$BASE/v1/audio/transcriptions" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -F file=@meeting-recording.mp3 \
  -F model=amazon.transcribe \
  -F response_format=json

Generate subtitles with streaming:

curl -N -X POST "$BASE/v1/audio/transcriptions" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -F file=@video-audio.mp3 \
  -F model=amazon.transcribe \
  -F response_format=srt \
  -F language=en

Ready to transcribe audio? Explore available transcription models in the Models API.