Generate Subtitles

The Generate Subtitles task uses advanced speech-to-text recognition to listen to your media's audio tracks and generate fully timed text subtitle files completely from scratch.

It is ideal for files that are completely missing subtitle tracks, allowing you to automatically create them so your content is fully accessible and searchable.

Configuration Settings

Concurrent Tasks

Description: The maximum number of subtitle generation tasks that can run simultaneously.
Usage: Speech-to-text processing is highly resource-intensive. If you are generating subtitles using local hardware, keeping this set to 1 or 2 prevents your system from becoming unresponsive.

Faster-Whisper Model

Description: Selects the size and capability of the AI speech recognition model used to listen to the audio track.
Options: Ranges from lightweight, fast options (like Base) to larger, highly precise models.
Usage: Larger models offer significantly better accuracy and foreign-language detection but take longer to process and require more system memory.

Custom Faster-Whisper Model

Description: Allows you to use a specialized or fine-tuned external Whisper model instead of the default built-in options.
Usage: To use a custom model, place the model weights (supported formats include .bin, .onnx, or .pt) inside your data/faster-whisper directory and enter the exact model folder name here.

Verbose Debug Output

Description: When enabled, outputs detailed, line-by-line transcription segments and language confidence scores directly into the processing logs.
Usage: Keep this disabled during normal use to keep your logs clean, and toggle it on only when troubleshooting audio tracking or missing dialogue issues.

Key Features

Smart Line Clustering: The generation engine listens to natural speech patterns and groupings, automatically determining when a sentence should be broken up across the timeline to ensure subtitles read comfortably on screen.
Non-Speech Filtering: The pipeline automatically detects and filters out empty background noise, audio artifacts, and repetitive non-speech murmurs (like "uh-huh" or "hmm") so your final subtitle track contains only meaningful text.
Forced Subtitles Creation: It can automatically isolate and generate only the dialogue lines that don't match the primary language of the video—perfect for providing translation overlays only when foreign languages are spoken in a film.

Configuration Settings​

Concurrent Tasks​

Faster-Whisper Model​

Custom Faster-Whisper Model​

Verbose Debug Output​

Key Features​