Post

- WhisperX github.com/m-bain/whisperX
WhisperX performs forced alignment with an external phoneme/CTC aligner, typically a wav2vec2-based model, to align a known transcript to the waveform and recover word timestamps.
English

- whisper-char-alignment github.com/30stomercury/w… whisper-char-alignment Whisper’s own decoder cross-attention maps, teacher-forces the reference text at character level, and uses DTW plus attention-head aggregation to infer word boundaries.
English

- OpenAI Whisper timing github.com/openai/whisper
OpenAI Whisper timing uses Whisper’s internal alignment heads and decoder cross-attention, then applies DTW over the token-to-frame attention matrix to derive word timestamps from the token sequence.
English
