nano-banana-pro
Generate or edit images via Gemini 3 Pro Image (Nano Banana Pro).
CMS, document processing, and media generation.
Generate or edit images via Gemini 3 Pro Image (Nano Banana Pro).
Extract frames or short clips from videos using ffmpeg.
Speech-to-text (逐字稿/转写) in Docker using faster-whisper (local, no API key). Use when you already have an audio file (e.g. from `media-audio-download`) and need a transcript with optional timestamps for summarization.
Download audio tracks from video links for transcription/summarization. Docker-first (no host Python): uses yt-dlp+ffmpeg for Bilibili and Playwright extraction for Xiaohongshu note pages. Use when a platform skill needs an audio file for STT (e.g. Bilibili “No subtitles found”, Xiaohongshu video notes), or when the user asks “把这个视频音频下载下来/做逐字稿”.
Process and transform images with Sharp for Node.js. Use when a user asks to resize images, convert image formats (WebP, AVIF, PNG, JPEG), compress images, crop or rotate photos, generate thumbnails, add watermarks, optimize images for web, batch process images, create responsive image variants, extract image metadata, or build image processing pipelines. Covers resizing, format conversion, compression, cropping, compositing, and metadata extraction.
Process audio files with SoX (Sound eXchange). Use when a user asks to apply audio effects, mix and combine audio tracks, convert audio formats, batch process audio files, normalize volume, trim silence, add reverb or echo, change tempo or pitch, split audio files, create spectrograms, generate test tones, resample audio, or build audio processing pipelines. Covers all SoX effects, format conversion, mixing, and batch workflows.
Download video and audio from YouTube and other platforms with yt-dlp. Use when a user asks to download YouTube videos, extract audio from videos, download playlists, get subtitles, download specific formats or qualities, batch download, archive channels, extract metadata, embed thumbnails, download from social media platforms (Twitter, Instagram, TikTok), or build media ingestion pipelines. Covers format selection, audio extraction, playlists, subtitles, metadata, and automation.
Transcode, convert, edit, and process audio and video with FFmpeg. Use when a user asks to convert video formats (webm to mp4, mkv to mp4), extract audio from video, compress video files, trim or cut clips, concatenate videos, add subtitles, create thumbnails, apply filters, change resolution or bitrate, re-encode media, create GIFs from video, add watermarks, normalize audio, stream media, or build automated media processing pipelines.
When the user needs to build file upload functionality for a web application. Use when the user mentions "file upload," "image upload," "upload endpoint," "multipart upload," "presigned URL," "S3 upload," "file validation," "upload to cloud storage," or "accept user files." Handles upload endpoints, file validation (type, size, magic bytes), cloud storage integration, and upload status tracking. For image/video processing after upload, see media-transcoder.
Generate waveform visualizations from audio files. Use when a user asks to create waveform images, build audio player visualizations, generate waveform data for web players, create podcast episode previews, build audio thumbnails, render waveform PNGs for social media, extract peak data as JSON, or integrate waveform generation into audio processing pipelines. Covers audiowaveform CLI, JSON/binary data output, and web player integration.
Edit and compose video with Python using MoviePy. Use when a user asks to programmatically edit videos, create video montages, add text overlays, build automated video pipelines, composite multiple clips, apply video effects, generate social media videos from templates, concatenate clips, extract audio, create GIFs, build slideshows, add transitions, resize and crop videos, or integrate video editing into Python applications. Covers MoviePy 2.x for compositing, effects, text, and rendering.
You are an expert in OpenCV (Open Source Computer Vision Library), the most popular library for real-time computer vision. You help developers build image processing pipelines, object detection systems, video analysis tools, augmented reality, and document processing using OpenCV's 2,500+ algorithms for image manipulation, feature detection, camera calibration, 3D reconstruction, and DNN inference — in Python, C++, or JavaScript.
Generate comprehensive Product Requirements Documents (PRDs) for product managers. Use this skill when users ask to "create a PRD", "write product requirements", "document a feature", or need help structuring product specifications.
Creates Mermaid diagrams for flowcharts, sequence diagrams, ERDs, and architecture visualizations in markdown. Use when users request "Mermaid diagram", "flowchart", "sequence diagram", "ERD diagram", or "architecture diagram".
PolicyEngine design system — tokens, typography, colors, charts, and branding for all project types. Triggers: "brand colors", "design tokens", "PolicyEngine colors", "typography", "font", "color palette", "CSS variables", "design system", "branding guidelines"
Create distinctive, on-brand frontend interfaces for Ascent Training. Use this skill when building landing pages, marketing sites, app UI, React components, or any visual artifact for Ascent. Combines the mountain/guide brand identity with topographic map aesthetics and emerald/charcoal color system. Generates production-grade code that is immediately recognizable as Ascent.
Creates modern CSS gradients using Tailwind CSS including linear, radial, conic, mesh gradients, animated gradients, glassmorphism, and gradient text effects. Use when users request "gradient background", "tailwind gradient", "modern gradient", "mesh gradient", or "animated gradient".
Provides reusable interaction patterns and motion presets that make UI feel polished. Includes hover effects, transitions, entrance animations, gesture feedback, and reduced-motion support. Use when adding "animations", "transitions", "micro-interactions", or "motion design".
Visual development with Patchright screenshots. Use when building, fixing, reviewing, or creating new UI pages and dashboard components. Use when the user wants to build a dashboard, create a new page, verify UI works correctly, or iterate on visual design. Takes screenshots, analyzes layout, iterates until correct.
Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.
Generate a premium mockup website for a prospect using the buildinamsterdam.com template style. Use when user asks to design a website, create a mockup, or build a prospect website.
Generate visual prototype prompts from SCR- entries for Google Stitch (or equivalent UI generation tool). Triggers on: 'make a prototype', 'visualize screens', 'generate Stitch prompt', 'I need a visual demo', 'prototype the workflow', 'show me what this looks like', 'get this to a demo', 'visual gate'. Consumes SCR- (Screen Flow Definition), PER- (Personas), UJ- (User Journeys), DES- (Design Components). Outputs Stitch prompt blocks per SCR- entry + Feedback Capture Template. No new SoT IDs created — this skill makes existing SCR- entries visual and routes feedback back to them.