Speech to Text API¶
Convert spoken words into text. For voice assistants, meeting transcription, or accessibility features using AWS Transcribe.
Why Choose Speech to Text?¶
-
100+ Languages
Transcribe audio in any language with automatic detection or manual specification for global applications and multilingual content. -
Real-Time or Batch
Stream transcriptions in real-time via SSE or process files efficiently. -
Subtitle Generation
Generate SRT and VTT subtitle files directly from AWS Transcribe with precise timing. -
Word-Level Timestamps
Get word-level or segment-level timestamps with verbose_json for video editing, searchable transcripts, and accessibility features.
Quick Start: Available Endpoint¶
| Endpoint | Method | What It Does | Powered By |
|---|---|---|---|
/v1/audio/transcriptions |
POST | Convert spoken audio to written text | AWS Transcribe |
Feature Compatibility¶
| Feature | Status | Notes |
|---|---|---|
| Input | ||
| Audio file upload | Multipart file upload | |
| Output Formats | ||
json |
Structured transcription | |
text |
Plain text output | |
verbose_json |
With timestamps and details | |
srt |
Subtitle format with timing | |
vtt |
WebVTT subtitle format | |
| Language | ||
| Language specification | ISO-639-1 language codes | |
| Auto language detection | Automatic identification | |
| Streaming | ||
| SSE streaming | Event-based streaming | |
| Advanced | ||
| Timestamp granularity | Word or segment level | |
chunking_strategy |
Only auto is supported |
|
temperature |
Not available | |
prompt |
Not available | |
logprobs |
Not available | |
| Usage tracking | ||
| Input audio duration | Seconds (billing unit) | |
| Output text tokens | Estimated token count for reference |
Legend:
- Supported — Fully compatible with OpenAI API
- Partial — Supported with limitations
- Unsupported — Not available in this implementation
Advanced Features¶
OpenAI-Compatible with AWS Power¶
Model & Features:
- Use
amazon.transcribe(instead ofwhisper-1) with the same interface - Auto-detect language or specify it for faster processing
- Word-level or segment-level timestamps with
verbose_json - Native Subtitles : SRT/VTT files generated directly by AWS Transcribe with precise timing
Note: The prompt, temperature, and chunking_strategy parameters are not supported to ensure consistent transcription accuracy.
Performance Tips: Optimize Speed & Cost
- Specify the language if you know it—skips auto-detection for faster processing and lower AWS costs
Configuration Required
You must configure the AWS_S3_BUCKET or AWS_TRANSCRIBE_S3_BUCKET environment variable with a bucket in the main AWS region to use this endpoint. This bucket is used for temporary storage during transcription processing.
Try It Now¶
Transcribe audio to JSON:
curl -X POST "$BASE/v1/audio/transcriptions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-F file=@meeting-recording.mp3 \
-F model=amazon.transcribe \
-F response_format=json
Generate subtitles with streaming:
curl -N -X POST "$BASE/v1/audio/transcriptions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-F file=@video-audio.mp3 \
-F model=amazon.transcribe \
-F response_format=srt \
-F language=en
Ready to transcribe audio? Explore available transcription models in the Models API.