home/categories/media/swyxio-skills-multimodal-extraction-skill-md
mediacontent-media

multimodal-extraction

Given a local video or video URL, downloads the media if needed, extracts slide frames and key moments, transcribes the audio, and writes a Markdown timeline that interleaves screenshots with the transcript at the associated timestamps. Use when asked to turn a video into a multimodal notes file, slide-synced transcript, screenshot-enhanced transcript, or talk recap with images.

swyxio
maintainer
swyxio
Atualizado 3/28/2026
Estrelas
26
Forks
0
quick start

Installation and usage

Given a local video or video URL, downloads the media if needed, extracts slide frames and key moments, transcribes the audio, and writes a Markdown timeline that interleaves screenshots with the transcript at the associated timestamps. Use when asked to turn a video into a multimodal notes file, slide-synced transcript, screenshot-enhanced transcript, or talk recap with images.

Instalação
$ install --globalskills.sh
Uso

Depois de instalar, você pode usar esta skill executando o seguinte comando no terminal:

skills use multimodal-extraction