videodb-skills
Upload, stream, search, edit, transcribe, and generate AI video and audio using the VideoDB SDK.
Upload, stream, search, edit, transcribe, and generate AI video and audio using the VideoDB SDK.
Process and manipulate images using ImageMagick. Supports resizing, format conversion, batch processing, and retrieving image metadata. Use when working with images, creating thumbnails, resizing wallpapers, or performing batch image operations.
GraalVM Native Image expert that adds native image support to Java applications, builds the project, analyzes build errors, applies fixes, and iterates until successful compilation using Oracle best practices.
Process media files (video, audio, images, documents) using Transloadit. Use when asked to encode video to HLS/MP4, generate thumbnails, resize or watermark images, extract audio, concatenate clips, add subtitles, OCR documents, or run any media processing pipeline. Covers 86+ processing robots for file transformation at scale.
Add image vision to NanoClaw agents. Resizes and processes WhatsApp image attachments, then sends them to Claude as multimodal content blocks.
Use when the user wants a local file or image sent back, such as "send me the file" or "发给我".
Use when the user wants a YouTube transcript from a single URL or video ID. Optimized for one input and one output: fetch the transcript fast, default to plain transcript text only, and avoid extra commentary unless the user asks for timestamps, JSON, or metadata. Triggers on: youtube transcript, transcript from this video, get captions, extract transcript from YouTube, summarize this YouTube transcript after fetching it.
Transcribe any audio or video file to text using Whisper (Groq or OpenAI). Use when the agent receives voice messages, audio files, video messages, or any media with speech. Triggers on: 'transcribe', 'what does this say', 'voice message', 'speech to text', 'audio', any file path ending in .ogg .mp3 .mp4 .wav .webm .m4a .flac .oga .oga
Capture frames or clips from RTSP/ONVIF cameras. Grabs snapshots, video clips, and motion events from IP cameras, security cameras, and video streams. Use when the user wants to take a snapshot from a camera, record a clip from an RTSP stream, monitor motion on a security camera, discover ONVIF devices on the network, or configure camera access for automated surveillance capture.
Generate or edit images via Gemini 3 Pro Image (Nano Banana Pro). Use when the user asks to create an image, generate a picture, produce AI-generated artwork, edit a photo, compose multiple images, or upscale an image to higher resolution. Supports text-to-image generation, single-image editing, and multi-image composition using the Gemini API.
Extract frames or short clips from videos using ffmpeg. Use when the user asks to grab a frame, capture a screenshot from a video, extract a thumbnail, pull a still image from footage, or snapshot a specific timestamp in a video file.
Downloads videos from YouTube and other platforms for offline viewing, editing, or archival. Handles various formats and quality options.
Expert knowledge for AI video clipping — yt-dlp downloading, whisper transcription, SRT generation, and ffmpeg processing
Best practices for Remotion - Video creation in React
Compresses images to WebP (default) or PNG with automatic tool selection. Use when user asks to "compress image", "optimize image", "convert to webp", or reduce image file size.
Process video subtitles — transcribe speech, optimize/translate text, burn styled subtitles into video. Use when you need to add subtitles to a video, transcribe audio, translate subtitles, or customize subtitle styles.
AI-powered audio/video editing — transcription, intelligent cut detection, automated editing with crossfades, and optional cloud polish. USE WHEN clean audio, edit audio, remove filler words, clean podcast, remove ums, fix audio, cut dead air, polish audio, clean recording, transcribe and edit.
Analyzes video content and extracts highlights. Use when user wants to analyze video, extract highlights, create video summary, generate video keywords, understand video content, find best moments, create trailer, extract exciting clips, get video insights, or identify viral moments. 视频分析、提取精彩片段、视频摘要、视频理解、精彩集锦、视频关键词、剪辑精华、内容分析、热门片段。
Combines multiple videos/images into a single video with optional background audio. Use when user wants to merge clips, concatenate videos, create slideshow from images, stitch videos together, combine media files, add background music to video, mix video with audio, create video montage, or join multiple video segments. 合并视频、拼接视频、图片合成视频、添加背景音乐、视频拼接、多图生成视频、视频混剪、素材合成。
Image editing using Sharp. Supports compositing (QR codes, logos, watermarks), resizing, cropping, rotating, flipping, brightness/contrast/saturation adjustment, blur, sharpen. 图片编辑、图片合成、添加二维码、添加Logo、添加水印、图片缩放、图片裁剪、图片旋转、图片翻转、亮度对比度饱和度调整、模糊、锐化。
Video editing using Volcengine Track structure. Supports cutting, trimming, adding text, stickers, audio, filters, effects, transitions, multi-clip compositions, speed adjustment, watermark removal. 视频剪辑、裁剪视频、添加文字、添加水印、添加音频、视频滤镜、视频特效、视频转场、多片段拼接、调整速度、去水印。
Extracts thumbnails from video URLs. Use when publishing video content that requires a cover image, when the user does not provide a cover/thumbnail, or before publishing to platforms that require cover images (Kwai, Bilibili, YouTube). 提取封面、视频封面、生成封面、缩略图提取、视频缩略图、截取封面。
Generates videos using Grok (preferred) and Google Veo 3.1 models. Supports text-to-video, image-to-video, first-last-frame, video extension, and reference images. AI视频生成、文生视频、图生视频、首尾帧生成、视频拓展。
Removes hardcoded subtitles from videos using AI inpainting. Use when user wants to remove subtitles, erase text from video, clean video from captions, delete burned-in subtitles, remove video watermarks, clean hardcoded text, or strip embedded subtitles. 去字幕、去除字幕、删除字幕、清除字幕、去硬字幕、去水印、擦除字幕、移除字幕。