image-handling
Image handling for Claude API constraints (5MB max, 8000px max dimension). Use when working with images, screenshots, or MCP browser tools.
Image handling for Claude API constraints (5MB max, 8000px max dimension). Use when working with images, screenshots, or MCP browser tools.
Extract frames or short clips from videos using ffmpeg.
火山视频理解 - 使用火山方舟视频理解 API 分析视频内容。通过 Files API 上传视频(推荐),支持大文件(最大512MB),可用于视频内容分析、物体识别、动作理解等。当用户需要分析视频、理解视频内容、提取视频信息时激活此技能。
Compresses images to WebP (default) or PNG with automatic tool selection. Use when user asks to "compress image", "optimize image", "convert to webp", or reduce image file size.
提供基于 FFmpeg 和 ImageMagick 的多媒体处理能力,支持视频和图像的格式转换、分辨率调整、压缩等操作
音频驱动的稀疏帧视频配音工具,支持音频驱动的 Video-to-Video 和 Image-to-Video 生成,实现精准的唇形、头部、身体姿态同步,支持无限时长视频生成
音频驱动的稀疏帧视频配音工具,支持音频驱动的 Video-to-Video 和 Image-to-Video 生成,实现精准的唇形、头部、身体姿态同步,支持无限时长视频生成
This skill should be used when the user asks to "optimize for Instagram", "YouTube Shorts format", "make it 9:16", "square video", "TikTok format", "Reels format", "prepare for social media", "encode for Twitter", "optimize for Facebook", "LinkedIn video", "crop for portrait", or mentions any platform-specific video format or upload requirements.
This skill should be used when the user asks to "compress this video", "reduce file size", "make this video smaller", "optimize for web", "shrink this video", "compress to under X MB", "reduce bitrate", "make it smaller without losing quality", "encode with H.265", or "re-encode this video".
This skill should be used when the user asks to "convert this video", "change format to mp4", "trim from X to Y", "cut the first X seconds", "speed up this video", "slow motion", "timelapse", "extract frames", "resize video", "scale down", "rotate video", "flip video", "remux", or any general FFmpeg video manipulation not covered by compress-video, make-gif, share-social, or extract-audio.
This skill should be used when the user asks to "extract audio", "get the mp3", "strip audio from video", "rip audio", "save audio from video", "convert to audio", "get the soundtrack", "pull the audio track", "save as mp3", "export audio", or "separate audio from video".
Manage Nikon Z5 II photo and video libraries. Batch resize, convert to JPEG XL or optimized JPEG via mozjpeg, create contact sheets, manage EXIF metadata, prepare share-ready albums, and process 4K video. Uses Python scripts via uv run for all batch operations. Use when working with JPG, NEF, HEIF, or MOV photo/video files, or when the user mentions photos, camera, Nikon, resize, sharing pictures, photo library, image optimization, JPEG XL, or contact sheet.
Download Douyin (抖音) videos from share links. Parse Douyin share text/links, download watermark-free videos, and transcribe audio to text using Volcano Engine ASR (Doubao Speech). Uses Python for iSH compatibility.
Download text, images, GIFs, and videos from Twitter/X posts via fxtwitter API. Trigger when users share any twitter.com or x.com link, or ask to download or see media from a tweet (e.g., '下载推特视频', '把这条推文的图存下来', 'what's in this tweet').
Processes videos to identify engaging moments, generate transcripts, and create highlight clips with artistic titles and custom cover images. Use when user needs to: extract highlights from long videos or livestreams, clip or cut best moments from videos, cut video highlights, process Bilibili/YouTube URLs or local video files, generate transcripts via Whisper, analyze content for engaging moments, create short-form clips with styled titles and covers, adjust cover text position and colors, find and export memorable scenes from recordings, burn subtitles into clips (with optional translation), guide clip selection with user intent, or identify speakers in multi-person conversations.
Use when you need to display and browse image collections.
Generate or edit images via Gemini 3 Pro Image (Nano Banana Pro).
Use when implementing video playback with controls.
Extract frames or short clips from videos using ffmpeg.
Transcribe YouTube videos and local audio/video files with speaker diarization. Use when user asks to transcribe a YouTube URL, podcast, video, or audio file. Outputs clean speaker-labeled transcripts ready for LLM analysis.
Automatically process unprocessed audio and image files in Gastrohem daily WhatsApp folders. This skill should be used when the user asks to transcribe audio files, perform OCR on images, or process media in daily folders (e.g., "Process media in today's folder", "Transcribe audio and OCR images in 24.10 folder"). Handles audio transcription using insanely-fast-whisper (parallelized, creates .json) and image OCR using Claude's vision capabilities (creates natural .md summaries with Gastrohem-relevant info).
FFmpeg video and audio processing patterns. Use when transcoding video/audio, extracting clips, adding filters, merging media, creating thumbnails, or batch processing media files.