youtube-content
YouTube research and content operations — search, download, transcript extraction, and audio processing via yt-dlp.
YouTube research and content operations — search, download, transcript extraction, and audio processing via yt-dlp.
录音文件转录能力。将音频文件转为结构化文本(带时间戳 + 说话人标识)。 这是一个场景能力 skill,类似 web-access——只负责"把声音变成文字",不决定文字用来做什么。 触发场景:用户提供录音文件、要求转录音频、处理语音文件。
Upscale and enhance image resolution using AI. Use when the user requests "Upscale image", "Enhance resolution", "Make image bigger", "Increase quality", or similar upscaling tasks.
Generate/edit images with Nano Banana Pro (Gemini 3 Pro Image). Use for image create/modify requests incl. edits. Supports text-to-image + image-to-image; 1K/2K/4K; use --input-image.
Generate or edit images via Gemini 3 Pro Image (Nano Banana Pro).
Expands and recomposes images into an IMAX 70mm portrait style (1.43:1 aspect ratio) with high-fidelity Christopher Nolan-esque aesthetics.
Transforms anime, art, or 3D rendering images into photorealistic cosplay-style photographs.
Restores vintage and blurry photos to high-definition 8k images while preserving identity.
Process videos with FFmpeg — improve quality, auto-contrast, downsample, denoise, or crop using the video_ffmpeg_process tool.
Extracts transcripts from video files using local WhisperX (preferred) or faster-whisper with GPU acceleration. Use when the user needs to transcribe a video, get captions, extract audio text, or convert video/audio to text. Triggers on "transcript", "transcribe", "speech to text", "video to text", "extract captions".
Extract and intelligently reframe clips from long-form 16:9 videos into 9:16 portrait or 1:1 square formats with face-tracking crop. Use for 'extract clips', 'reframe video', 'clip extractor', 'portrait crop', 'face tracking', '16:9 to 9:16', 'smart crop', 'make shorts from video', 'auto reframe'.
Local visual regression check for layout or rendering changes. Renders all gallery examples, pixel-diffs against main, and opens changed renders as BEFORE/AFTER pairs. In most cases the CI render preview on a PR is sufficient - use this skill only for pre-push confidence on risky changes or when the user explicitly asks for a local diff.
Video editing orchestrator and router. Detects video format (long-form vs short-form) and routes to the correct editing skill. Also provides shared component library, brand assets, and rules used by all editing processes. Use when user wants to edit video, create composition, add effects, make it polished, or finish the video.
Edit short-form videos (under 90 seconds) using Remotion compositions. Handles pipeline clips, standalone demos, and announcements with pop-outs, captions, SFX, and CTAs. Use for "short-form editing", "edit clip", "edit short", "pipeline clip editing", "edit demo", "short video editing", "edit announcement", "reels editing", "shorts editing", "tiktok editing".
Upload and compress videos for YouTube publishing. Handles local compression via HandBrake and upload to Zernio storage.
Extracts YouTube video transcripts and saves them as structured markdown files with metadata and timestamped content. When a user shares a YouTube URL, IMMEDIATELY runs the extraction script, creates a local folder, and saves the transcript. Handles both manual and auto-generated captions.
Transcribe audio and video files to text using the Whisper speech-to-text API at {{WHISPER_HOST}}:{{WHISPER_PORT}}.
Convert images between formats (PNG, JPG, WebP) using ImageMagick in the shared volume at {{SHARED_VOLUME}}.
Resize and crop images using ImageMagick in the shared volume at {{SHARED_VOLUME}}.