audio-transcription
Transcribe uploaded audio files (MP3, WAV, M4A) to text using OpenAI Whisper. Generate clean text or SRT subtitles. Works offline after initial setup.
Transcribe uploaded audio files (MP3, WAV, M4A) to text using OpenAI Whisper. Generate clean text or SRT subtitles. Works offline after initial setup.
Image optimization analysis for SEO and performance. Checks alt text, file sizes, formats, responsive images, lazy loading, and CLS prevention. Use when user says "image optimization", "alt text", "image SEO", "image size", or "image audit".
Fixed 1080x1920 pixel video format with percentage-based positioning. Use this when laying out video compositions, positioning elements on the canvas, or calculating dimensions. All videos render at exactly 9:16 aspect ratio for TikTok/Instagram Reels.
Adaptive Bitrate (ABR) streaming automatically adjusts video quality based on network conditions. This guide covers HLS, DASH, and player implementation for building video streaming solutions that pro
Generate video using Google Veo (Veo 3.1 / Veo 3.0). Use when: creating video clips from text prompts, generating B-roll, making animated content. DON'T use when: editing existing videos (use ffmpeg/video-frames), extracting frames from video (use video-frames skill), or adding subtitles (use video-subtitles skill).
Detect and extract hidden data embedded in images, audio, and other media files using steganalysis tools to uncover covert communication channels.
Optimize and compress images for web use. Reduces file sizes of JPEG, PNG, GIF images using lossy/lossless compression. Can resize images to maximum dimensions, convert to WebP format, and process entire directories recursively. Use when images are too large for web, need compression, or need format conversion.
ffmpeg recipes and best practices: convert, concatenate, merge, resize, compress, GIF creation, audio extraction, subtitles, optimize for social platforms.
Expert guide for CV181X/CV182X/CV180X (SG200X) multimedia development using CVI MPI API. Use this skill when working with: VI (video input/camera/ISP), VPSS (video processing/scaling/crop), VENC (H.264/H.265/JPEG encoding), VDEC (decoding), VB (video buffer pools), SYS binding, or any CVI_* API calls. Covers camera pipeline setup, offline VPSS processing, VB pool planning, and error diagnosis (ERR_VPSS_NOBUF, ERR_VB_NOBUF). API details in references/.
Extract a YouTube video transcript from a URL and summarize it into important content, learnings, and suggestions. Use when the user provides a YouTube link and wants a clean transcript and structured notes without timestamps, with fallback extraction methods if the primary tool is unavailable.
Zoom Video SDK for Web - JavaScript/TypeScript integration for browser-based video sessions, real-time communication, screen sharing, recording, and live transcription
FFmpeg automation for cutting, trimming, concatenating videos. Audio mixing, timeline editing, transitions, effects. Export optimization for YouTube, social media. Subtitle handling, color grading, batch processing. Use for videogen projects, content creation, automated video production. Activate on "video editing", "FFmpeg", "trim video", "concatenate", "transitions", "export optimization". NOT for real-time video editing UI, 3D compositing, or motion graphics.
Intelligent video processor for downloading media and extracting transcripts from YouTube and 1000+ supported sites. Automatically handles format selection, subtitle extraction, and post-processing.
Remove the visible Doubao (豆包) AI watermark from images. Use when asked to remove Doubao watermarks, clean Doubao-generated images, or process images with the "豆包AI生成" watermark.
Generate professional YouTube thumbnails with AI-powered expression cutouts and Remotion rendering. Perfect for content creators who want consistent, high-quality thumbnails at scale.
Summarize YouTube videos by extracting and analyzing their transcripts. Use when a user shares a YouTube URL and asks to summarize, explain, analyze, or extract information from the video. Triggers on YouTube links (youtube.com, youtu.be) with requests like "summarize this video", "what's this video about", "give me the key points", or "explain this video".
Process video files with audio extraction, format conversion (mp4, webm), and Whisper transcription. Use when user mentions video conversion, audio extraction, transcription, mp4, webm, ffmpeg, or whisper transcription.
Automated video processing: metadata extraction, thumbnails, transcoding, audio extraction with DuckDB tracking
Downloads videos from YouTube and other platforms for offline viewing, editing, or archival. Handles various formats and quality options.
FFmpeg media processing. Video/audio transcoding, stream manipulation, and filter graphs.