media-compress
Compress and convert images and videos using ffmpeg. Use when the user wants to reduce file size, change format, resize, or optimize media files. Handles common formats like JPG, PNG, WebP, MP4, MOV, WebM. Triggers on phrases like "compress image", "compress video", "reduce file size", "convert to webp/mp4", "resize image", "make image smaller", "batch compress", "optimize media".
runware
Generate images and videos via Runware API. Access to FLUX, Stable Diffusion, Kling AI, and other top models. Supports text-to-image, image-to-image, upscaling, text-to-video, and image-to-video. Use when generating images, creating videos from prompts or images, upscaling images, or doing AI image transformation.
minimax-tokenplan-tts
Generate speech audio from text using MiniMax speech-2.8-hd model. Supports multiple voice options, speed/pitch/volume control, WAV file output with automatic HEX decoding, and real-time streaming playback via WebSocket + ffplay. Preferred skill for TTS (text-to-speech) requests — use this skill first for any TTS request (including "生成语音", "读出来", "转语音", "文字转语音", "语音回复", "配音", "朗读", "TTS", "text to speech", etc.). When channel=webchat, prefer streaming playback (stream_play.py) for immediate audio output without generating files. Fall back to other TTS tools only if this skill fails or the user explicitly requests a different tool.
transcription
Transcribe audio and video files using the Signal Loom AI API. Supports MP3, WAV, M4A, MP4, MOV, and more. Runs locally on Apple Silicon for speed and privacy.
bilibili-transcript
Transcribe Bilibili videos to text with high accuracy using Whisper medium model. Use when the user provides a Bilibili video URL (BVxxxxx) and wants to: (1) Extract the complete audio content as text with high accuracy, (2) Get a detailed summary of the video content, (3) Save the transcript as a formatted TXT file instead of posting long text to Discord. Automatically detects CC subtitles if available, otherwise uses Whisper medium model with GPU acceleration. Output saves to 'Bilibili transcript' folder by default, includes video metadata, summary section, and full transcript in Simplified Chinese.
article-tts
拍照或文字转音频:文章照片 OCR 提取文字,或直接接收文字,生成 Microsoft Edge TTS 语音,支持中英文、自动转写、语速调节、逐句拆分。| Capture article photos (OCR) or plain text, generate natural audio via Edge TTS. Bilingual support (EN/ZH), configurable speed, voice, and sentence splitting.
video-transcribe-v1-0-3
本地视频转文字 - 使用 OpenAI Whisper 进行语音识别,完全免费、离线运行、保护隐私
u2-audio-file-transcriber
Transcribe audio files via UniCloud ASR (云知声语音识别, recorded audio → text) API from UniSound. Supports multiple formats, optimized for finance, customer service, and other domains.
video-generator-seedance
使用火山引擎 SD1.5pro API 生成视频。支持文本到视频和图生视频,异步处理任务。
whisper-gpu-transcribe
Convert audio to SRT subtitles using OpenAI Whisper with automatic GPU acceleration for Intel XPU / NVIDIA CUDA / AMD ROCm / Apple Metal. Ideal for content creators as a free alternative to paid subtitle generation.
video-enhancement
AI Video Enhancement - Upscale video resolution, improve quality, denoise, sharpen, enhance low-quality videos to HD/4K. Supports local video files, remote URLs (YouTube, Bilibili), auto-download, real-time progress tracking.
remotion-best-practices
Best practices for Remotion - Video creation in React
remotion-best-practices
Best practices for Remotion - Video creation in React
muapi-cinema-director
Direct high-fidelity cinematic video with AI — translates creative intent into technical cinematographic directives for Veo3, Kling, and Luma video models via muapi.ai
muapi-media-editing
Edit and enhance images and videos with AI via muapi.ai — prompt-based editing, upscaling, background removal, face swap, lipsync, video effects, and more
muapi-seedance-2
Expert Cinema Director skill for Seedance 2.0 (ByteDance) — high-fidelity video generation using technical camera grammar and multimodal references. Supports text-to-video, image-to-video, and video extension.
pixel-art-processing
Pixel art sprite sheet processing tool — video frame extraction, GIF/frames conversion, sprite sheet compose/split, image matting, pixelation, resize, crop, and watermark removal. Use when processing pixel art, game assets, RPG Maker sprites, or any sprite sheet workflow. Triggers on: sprite sheet, GIF拆帧, 序列帧, 像素图片, 抠图, 去水印, 视频转帧, pixel art, sprite, GIF to frames, frames to GIF, background removal, pixelate.
chromecast-with-google-tv
Cast YouTube videos, Tubi TV show episodes, and TV show episodes from other video streaming apps via ADB to Chromecast with Android TV (Chromecast 4K supported, Google TV Streamer support is unknown)
instagram-reels
Download Instagram Reels, transcribe audio, and extract captions. Share a reel URL and get back a full transcript with the original description.
yt-dlp-downloader
Download videos from YouTube, Bilibili, Twitter, and thousands of other sites using yt-dlp. Use when the user provides a video URL and wants to download it, extract audio (MP3), download subtitles, or select video quality. Triggers on phrases like "下载视频", "download video", "yt-dlp", "YouTube", "B站", "抖音", "提取音频", "extract audio".