assemblyai-transcribe
Transcribe audio/video with AssemblyAI (local upload or URL), plus subtitles + paragraph/sentence exports.
Transcribe audio/video with AssemblyAI (local upload or URL), plus subtitles + paragraph/sentence exports.
Download videos, audio, subtitles, and clean paragraph-style transcripts from YouTube and any other yt-dlp supported site. Use when asked to “download this video”, “save this clip”, “rip audio”, “get subtitles”, “get transcript”, or to troubleshoot yt-dlp/ffmpeg and formats/playlists.
Generate SRT subtitles from video/audio with translation support. Transcribes Hebrew (ivrit.ai) and English (whisper), translates between languages, burns subtitles into video. Use for creating captions, transcripts, or hardcoded subtitles for WhatsApp/social media.
Generate, edit, and upscale images; create videos from images or other videos via Venice AI. Supports text-to-image, image-to-video (Sora, WAN), video-to-video (Runway Gen4), upscaling, and AI editing.
Best practices for Remotion - Video creation in React
Create beautiful audio plug-ins using the D language. Process audio files using existing VST2 plugins. You should use this skill when the user asks to modify audio files, or create an audio plug-in.
Generate or edit images via Gemini 3 Pro Image (Nano Banana Pro).
Extract frames or short clips from videos using ffmpeg.
NVIDIA hardware video encoding/decoding integration. Configure NVENC encoding parameters, set up NVDEC decoding pipelines, handle codec configurations, integrate with CUDA for pre/post processing, and manage video memory surfaces.
Video platform optimization and analytics for YouTube and other video channels
Compress JavaScript, CSS, HTML, JSON, and image files using node-minify library. Use when: minifying/compressing assets, bundling JS/CSS files, optimizing images (WebP/AVIF), concatenating files, or when user mentions "node-minify", "@node-minify", "minification". Triggers: "minify", "compress JS/CSS", "bundle", "optimize images", "reduce file size".
Download YouTube videos with customizable quality and format options. Use this skill when the user asks to download, save, or grab YouTube videos. Supports various quality settings (best, 1080p, 720p, 480p, 360p), multiple formats (mp4, webm, mkv), and audio-only downloads as MP3.
Convert YouTube videos to SEO-optimized blog posts. Extract video title, description, and content, then generate a search-engine-friendly blog post with embedded video, cover images, and optimized metadata. Auto-generates English filenames and saves to the configured Hexo blog posts directory.
Ultra-fast speech transcription using iFLYTEK Speed Transcription API. Transcribe audio files (WAV/PCM/MP3) up to 5 hours in ~20 seconds per hour. Supports Chinese, English, and 202+ Chinese dialects with automatic language detection. Use when user asks to transcribe audio files, convert speech to text, or mentions "speed transcription" or "极速转写".
Translate and dub videos from one language to another, replacing the original audio with TTS while keeping the video intact.
Fetch YouTube transcripts via APIFY API. Works from cloud IPs (Hetzner, AWS, etc.) by bypassing YouTube's bot detection. Free tier includes $5/month credits (~714 videos). No credit card required.
Generate or edit images via Gemini 3 Pro Image (Nano Banana Pro).
Extract frames or short clips from videos using ffmpeg.
Use when user provides a topic and wants an automated video podcast created, OR when user wants to learn/analyze video design patterns from reference videos — handles research, script writing, TTS audio synthesis, Remotion video creation, and final MP4 output with background music. Also supports design learning from reference videos (learn command), style profile management, and design reference library. Supports Bilibili, YouTube, Xiaohongshu, Douyin, and WeChat Channels platforms with independent language configuration (zh-CN, en-US).
Optimize images, fonts, scripts, and metadata for Next.js performance and Core Web Vitals. Use when configuring next/image for LCP, next/font for zero layout shift, next/script loading strategies, or generateMetadata for SEO. (triggers: **/layout.tsx, **/page.tsx, next/image, next/font, metadata, generateMetadata)