Generate Subtitles
The Generate Subtitles task uses advanced speech-to-text recognition to listen to your media's audio tracks and generate fully timed text subtitle files completely from scratch.
It is ideal for files that are completely missing subtitle tracks, allowing you to automatically create them so your content is fully accessible and searchable.
Configuration Settings
Concurrent Tasks
- Description: The maximum number of subtitle generation tasks that can run simultaneously.
- Usage: Speech-to-text processing is highly resource-intensive. If you are generating subtitles using local hardware, keeping this set to
1or2prevents your system from becoming unresponsive.
Faster-Whisper Model
- Description: Selects the size and capability of the AI speech recognition model used to listen to the audio track.
- Options: Ranges from lightweight, fast options (like
Base) to larger, highly precise models. - Usage: Larger models offer significantly better accuracy and foreign-language detection but take longer to process and require more system memory.
Custom Faster-Whisper Model
- Description: Allows you to use a specialized or fine-tuned external Whisper model instead of the default built-in options.
- Usage: To use a custom model, place the model weights (supported formats include
.bin,.onnx, or.pt) inside yourdata/faster-whisperdirectory and enter the exact model folder name here.
Verbose Debug Output
- Description: When enabled, outputs detailed, line-by-line transcription segments and language confidence scores directly into the processing logs.
- Usage: Keep this disabled during normal use to keep your logs clean, and toggle it on only when troubleshooting audio tracking or missing dialogue issues.
Key Features
- Smart Line Clustering: The generation engine listens to natural speech patterns and groupings, automatically determining when a sentence should be broken up across the timeline to ensure subtitles read comfortably on screen.
- Non-Speech Filtering: The pipeline automatically detects and filters out empty background noise, audio artifacts, and repetitive non-speech murmurs (like "uh-huh" or "hmm") so your final subtitle track contains only meaningful text.
- Forced Subtitles Creation: It can automatically isolate and generate only the dialogue lines that don't match the primary language of the video—perfect for providing translation overlays only when foreign languages are spoken in a film.