media-processing
Process multimedia files with FFmpeg (video/audio encoding, conversion, streaming, filtering, hardware acceleration), ImageMagick (image manipulation, format conversion, batch processing, effects, composition), and RMBG (AI-powered background removal). Use when converting media formats, encoding videos with specific codecs (H.264, H.265, VP9), resizing/cropping images, removing backgrounds from images, extracting audio from video, applying filters and effects, optimizing file sizes, creating streaming manifests (HLS/DASH), generating thumbnails, batch processing images, creating composite images, or implementing media processing pipelines. Supports 100+ formats, hardware acceleration (NVENC, QSV), and complex filtergraphs.
xstranscriber
音声ファイルをテキストに文字起こしするスキル。mp3/wav/m4a/ogg/flac形式に対応。whisperベースのtranscriber_toolを使用し、tiny/base/small/medium/largeの5つのモデルから精度と速度のバランスを選択可能。長時間音声はバックグラウンド実行に対応。「文字起こしして」「音声をテキストに変換して」で使用。
optimizing-io-operations
Optimizes standard I/O and file operations for high-performance data processing in .NET. Use when building high-throughput file processing or competitive programming solutions.
ai-multimodal
Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (better image analysis than Claude models, captioning, reasoning, object detection, design extraction, OCR, visual Q&A, segmentation, handle multiple images), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image with Imagen 4, editing, composition, refinement), generate videos (text-to-video with Veo 3, 8-second clips with native audio). Use when working with audio/video files, analyzing images or screenshots (instead of default vision capabilities of Claude, only fallback to Claude's vision capabilities if needed), processing PDF documents, extracting structured data from media, creating images/videos from text pr
nano-banana-pro
Generate or edit images via Gemini 3 Pro Image (Nano Banana Pro).
compress-images
Compress images for web/SEO performance using cwebp. Use when optimizing images for faster page loads, reducing file sizes, or converting JPG/PNG to WebP format.
omnicaptions-convert
Use when converting between caption formats (SRT, VTT, ASS, TTML, Gemini MD, etc.). Supports 30+ caption formats.
youtube-data-api
YouTube Data API v3 complete wrapper - search videos, get video/channel/playlist details, get comments, download subtitles, etc. Supports filtering and sorting by time/views/rating and more.
transcribe-video
Generate subtitles (SRT/VTT) and plain text transcripts from video or audio files using AWS Transcribe. Use when creating captions, extracting spoken content, generating transcripts for notes, or making video content searchable.
download-video
Download videos from social media URLs (X/Twitter, YouTube, Instagram, TikTok, etc.) using yt-dlp. Use when saving a video locally, extracting content for transcription, or archiving video references.
omnicaptions-transcribe
Use when transcribing audio/video to text with timestamps, speaker labels, and chapters. Supports YouTube URLs and local files. Produces structured markdown output.
omnicaptions-laicut
Use when user needs accurate/precise caption timing, or aligning captions with audio/video using forced alignment. Corrects caption timing to match actual speech. Uses LattifAI Lattice-1 model.
omnicaptions-download
Use when downloading videos, audio, or captions from YouTube and other video platforms. Supports quality selection.
file-converter
Convert & transform files - images (resize, format, HEIC), markdown (PDF/HTML), data (CSV/JSON/YAML/TOML/XML), SVG, base64, text encoding. Cross-platform, single & batch mode. This skill should be used when converting file formats, resizing images, generating PDFs from markdown, or transforming data between formats.
youmind-youtube-transcript
Extract and summarize YouTube video transcripts via YouMind API — no yt-dlp, no proxy needed. Batch extract up to 5 videos at once with parallel processing. Saves videos to your YouMind board with timestamped transcripts in markdown. Automatically summarizes video content after extraction. Works from any IP (cloud, VPS, CI/CD, corporate networks). Use when user wants to "get YouTube transcript", "extract video subtitles", "transcribe YouTube video", "batch transcribe videos", "get video captions", "summarize YouTube video", "YouTube video summary", "summarize this video", "what does this video say", "YouTube 字幕", "YouTube 总结", "视频总结", "YouTube 文字起こし", "YouTube 자막", or "download YouTube transcript".
yt-dlp
Download videos, extract audio, and get transcripts from YouTube and 1000+ sites. Use when asked to download a video, extract audio, get a transcript, rip subtitles, or fetch media. Triggers on "download video", "yt-dlp", "extract audio", "YouTube download", "get transcript", "subtitles", or any media download request.
video-editing
Automated video editing skill for talk/vlog/standup videos. Use when: cutting video, splitting video into sentences, merging video clips, extracting audio, transcribing speech, auto-editing oral presentation videos, combining selected sentence clips into a final video, generating video cover/thumbnail with title, B-roll cutaway editing, persistent video overlay/watermark, blinking REC indicator, ending title cards, multi-source audio mixing, exporting to JianYing/CapCut project for further editing, generating voiceover videos with Remotion (audio-only to video with animated visuals/subtitles). Requires ffmpeg and whisper. Remotion workflow additionally requires Node.js and npm.
oral-history-tools
Use this Skill to process oral history recordings: Whisper transcription with timestamps, pyannote speaker diarization, OHMS metadata XML, and speaker anonymization.
compression-codecs
Configure and optimize compression for Zarr arrays. Covers all numcodecs compressors (Blosc, Zstd, LZ4, Gzip, LZMA, BZ2), pre-compression filters (Delta, Quantize, FixedScaleOffset, PackBits), codec pipelines, Blosc thread safety, and the trade-offs between compression speed and ratio.
seo-images
Image optimization analysis for SEO and performance. Checks alt text, file sizes, formats, responsive images, lazy loading, and CLS prevention. Use when user says "image optimization", "alt text", "image SEO", "image size", or "image audit".