image-tools
CLI image manipulation — convert PNG/JPG to SVG, remove watermarks, resize, crop, and edit raster images using ImageMagick and vtracer
CMS, document processing, and media generation.
CLI image manipulation — convert PNG/JPG to SVG, remove watermarks, resize, crop, and edit raster images using ImageMagick and vtracer
Transcribe audio files to text using OpenAI Whisper CLI — supports voice messages, audio recordings, and multiple languages.
AI-assisted video editing workflows for cutting, structuring, and augmenting real footage. Covers the full pipeline from raw capture through FFmpeg, Remotion, ElevenLabs, fal.ai, and final polish in Descript or CapCut. Use when the user wants to edit video, cut footage, create vlogs, or build video content.
Orchestrate multi-clip AI video projects — style anchors, chaining patterns, frame-level QA, montage assembly. Not for video analysis, research, provider settings, or FFmpeg encoding.
Enhances image generation prompts with Subject-Context-Style structure, style anchors, character consistency, mcp-image workflows. Not for video generation, TTS, FFmpeg, audio, or design-to-code.
FFmpeg video/audio processing — conversion, scaling, compression, trimming, concatenation, AI post-processing. Not for audio ducking/voice mixing (tts-production) or Remotion rendering.
Downloads videos and audio from YouTube, Bilibili, Twitter, and other platforms using yt-dlp. Supports quality selection, format conversion, and audio extraction.
This skill should be used when users need to rotate images by 90 degrees. It handles image rotation tasks for common formats (PNG, JPG, JPEG, GIF, BMP, TIFF) using a reliable Python script that preserves image quality and supports both clockwise and counter-clockwise rotation.
Download YouTube videos with customizable quality and format options. Use this skill when the user asks to download, save, or grab YouTube videos. Supports various quality settings (best, 1080p, 720p, 480p, 360p), multiple formats (mp4, webm, mkv), and audio-only downloads as MP3.
Generate or edit images via Gemini 3 Pro Image (Nano Banana Pro).
Extract frames or short clips from videos using ffmpeg.
Transcribe audio and video files to text using OpenAI Whisper
Universal media gallery — browse images/videos from any local folder with copy-path, enlarge, and video playback. Reusable across all gen projects.
Unified media generation via fal.ai MCP — image, video, and audio. Covers text-to-image (Nano Banana), text/image-to-video (Seedance, Kling, Veo 3), text-to-speech (CSM-1B), and video-to-audio (ThinkSound). Use when the user wants to generate images, videos, or audio with AI.
MiniMax multimodal model skill — use MiniMax Multi-Modal models for speech, music, video, and image. Create voice, music, video, and images with MiniMax AI: TTS (text-to-speech, voice cloning, voice design, multi-segment), music (songs, instrumentals), video (text-to-video, image-to-video, start-end frame, subject reference, templates, long-form multi-scene), image (text-to-image, image-to-image with character reference), and media processing (convert, concat, trim, extract). Use when the user mentions MiniMax, multimodal generation, or wants speech/music/video/image AI, MiniMax APIs, or FFmpeg workflows alongside MiniMax outputs.
Capture and understand camera images using the robot's head camera and VLM.
Analyzes target audience demographics, psychographics, behaviors, and platform preferences to inform influencer selection and campaign strategy. Essential foundation for effective influencer marketing.