video-downloader
Download videos from 1000+ websites (YouTube, Bilibili, Twitter/X, TikTok, etc.) using yt-dlp. Use this skill when users provide video URLs and want to download videos, extract audio, or need help with video download issues.
gif-splitter
GIF动图切分工具,将超过指定帧数的GIF文件自动拆分成多个小文件。适用于微信公众号等平台上传GIF时遇到"帧数超限"的问题。
minimax-multimodal-toolkit
MiniMax-native multimodal workflow for image, video, voice, music, and media-processing tasks. Use when the user asks to generate image/video/audio assets, wants MiniMax-specific media APIs, needs TTS or voice workflows, wants reproducible local media outputs, or needs FFmpeg-style processing around generated media.
video-prompting
Draft and refine prompts for video generation models (text-to-video and image-to-video), and create character-sheet prompts for image models when the goal is character consistency before image-to-video. Use when a user asks for a "video prompt", a model-specific prompt such as Seedance 2.0, Ovi, Sora, Veo 3, Wan 2.2, LTX-2, or LTX-2.3, or a consistent-character prompt such as "character sheet prompt", "character turnaround", "character reference sheet", or "photographic identity sheet".
video-frames
Extract frames, thumbnails, or clips from video files using ffmpeg. Use when analyzing video content or creating previews.
nano-banana-pro
Generate or edit images via Gemini 3 Pro Image (Nano Banana Pro).
omk-youtube
Extract and summarize YouTube video content via subtitle extraction. Trigger when user shares a YouTube URL (youtube.com or youtu.be), says 'summarize this video', 'watch this', '看看这个视频', or wants to understand video content without watching. Also trigger for video transcript extraction.
master-asset-converter
moorestech_masterリポジトリ内のPNGファイルを見つけてJPEGに変換し、アセット画像フォーマットを統一する。Use when: (1) moorestech_masterにPNG画像が混在している時 (2) 「PNGをJPEGに変換して」「画像フォーマットを統一して」と依頼された時 (3) 新しいアセット画像を追加した後にフォーマットを揃えたい時
image-batch
Batch process images for marketing. Use when: resizing images for social media; compressing images for web; removing backgrounds; adding watermarks; converting formats to WebP; optimizing for Core Web Vitals
whisper-transcription
Transcribe audio and video files to text using OpenAI Whisper. Use when: converting podcasts to blog posts; creating video subtitles; extracting quotes from interviews; repurposing video content to text; building searchable audio archives
video-processing
Process video files with ffmpeg automation. Use when: compressing videos for upload; extracting audio from video; resizing for social formats; clipping segments; merging multiple videos; generating thumbnails
videodb
See, Understand, Act on video and audio. See- ingest from local files, URLs, RTSP/live feeds, or live record desktop; return realtime context and playable stream links. Understand- extract frames, build visual/semantic/temporal indexes, and search moments with timestamps and auto-clips. Act- transcode and normalize (codec, fps, resolution, aspect ratio), perform timeline edits (subtitles, text/image overlays, branding, audio overlays, dubbing, translation), generate media assets (image, audio, video), and create real time alerts for events from live streams or desktop capture.
fennec-image-compression
Use this skill when asked to compress, resize, or analyze images in Go using the Fennec library, or when modifying the Fennec codebase itself.
audio-editing
Master the essential audio post-production techniques—normalization, compression, EQ, and noise reduction—using the correct processing order to achieve professional-quality audio. Use when: Editing podcast episodes or video soundtracks; Cleaning up recorded voiceovers; Improving audio quality for marketing content; Preparing audio files for distribution; Troubleshooting common audio issues
pydub-automation
Automate repetitive audio tasks with Python using PyDub for batch processing, format conversion, normalization, and content assembly. Use when: Processing large numbers of audio files consistently; Converting between audio formats at scale; Normalizing loudness across a batch of files; Assembling intros/outros automatically to episodes; Trimming silence or extracting segments programmatically
ai-multimodal
Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (better image analysis than Claude models, captioning, reasoning, object detection, design extraction, OCR, visual Q&A, segmentation, handle multiple images), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image with Imagen 4, editing, composition, refinement), generate videos (text-to-video with Veo 3, 8-second clips with native audio). Use when working with audio/video files, analyzing images or screenshots (instead of default vision capabilities of Claude, only fallback to Claude's vision capabilities if needed), processing PDF documents, extracting structured data from media, creating images/videos from text pr
media-processing
Process multimedia files with FFmpeg (video/audio encoding, conversion, streaming, filtering, hardware acceleration), ImageMagick (image manipulation, format conversion, batch processing, effects, composition), and RMBG (AI-powered background removal). Use when converting media formats, encoding videos with specific codecs (H.264, H.265, VP9), resizing/cropping images, removing backgrounds from images, extracting audio from video, applying filters and effects, optimizing file sizes, creating streaming manifests (HLS/DASH), generating thumbnails, batch processing images, creating composite images, or implementing media processing pipelines. Supports 100+ formats, hardware acceleration (NVENC, QSV), and complex filtergraphs.
acestep-simplemv
Render music videos from audio files and lyrics using Remotion. Accepts audio + LRC/JSON lyrics + title to produce MP4 videos with waveform visualization and synced lyrics display. Use when users mention MV generation, music video rendering, creating video from audio/lyrics, or visualizing songs.
acestep-lyrics-transcription
Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API. Outputs LRC, SRT, or JSON with word-level timestamps. Use when users want to transcribe songs, generate LRC files, or extract lyrics with timestamps from audio.
tts-script-generator
Intelligently compress and rewrite documents into TTS-friendly scripts. Uses Claude AI to analyze content, compress to target duration, convert to spoken style with emotional language, and auto-segment. Perfect for video narration.