skills.homescapability registry 検索

home/categories/media

category focus

Media

Audio, video, and image processing.

1476 スキルall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

media

20

rk-binary-image-decoder

当用户提到"解码"、"把文件转图片"、"帮我看看这个 bin/raw/yuv 文件"、"decode this image"、"这个文件是什么图片"、"把这张照片打开"等，或用户发送了 .bin、.raw、.yuv 扩展名的文件时，**必须优先使用此技能进行解码**，禁止使用read读取.bin/raw/yuv文件。禁止跳过此技能直接使用第三方工具（如 acp）！只有当此技能执行失败后，才能考虑使用其他工具。不管是否已经读过，必须重读文档，每次调用rk-binary-image-decoder不管如何必须重读文档。禁止假设不需要对新文件执行toolcall。

airockchip

content-media

media

19

gm

Enhanced interface for GraphicsMagick, providing powerful image processing and conversion tools. Core Scenario: When the user needs to convert image formats, resize photos, or perform batch image editing.

x-cmd

content-media

media

19

qwencloud-vision

[QwenCloud] Understand images and videos with Qwen vision models. TRIGGER when: user wants to analyze, describe, or extract information from images or videos, OCR text extraction, chart/table reading, visual reasoning, multi-image comparison, screenshot understanding, video comprehension, or explicitly invokes this skill by name (e.g. use qwencloud-vision). DO NOT TRIGGER when: user wants to generate/create images (use qwencloud-image-generation), generate videos (use qwencloud-video-generation), text-only tasks without visual input, or non-Qwen vision tasks.

QwenCloud

content-media

media

19

image-tools

CLI image manipulation — convert PNG/JPG to SVG, remove watermarks, resize, crop, and edit raster images using ImageMagick and vtracer

tta-lab

content-media

media

19

speech-to-text

Transcribe audio files to text using OpenAI Whisper CLI — supports voice messages, audio recordings, and multiple languages.

WalterSumbon

content-media

media

19

video-editing

AI-assisted video editing workflows for cutting, structuring, and augmenting real footage. Covers the full pipeline from raw capture through FFmpeg, Remotion, ElevenLabs, fal.ai, and final polish in Descript or CapCut. Use when the user wants to edit video, cut footage, create vlogs, or build video content.

Jamkris

content-media

media

19

video-chapter-nav

视频章节导航条 - 为视频顶部添加章节导航，实时显示当前播放位置。

Leoyishou

content-media

media

19

api-asr

火山引擎语音识别 - 将音频/视频转文字，支持长音频分段识别。

Leoyishou

content-media

media

19

video-production

Orchestrate multi-clip AI video projects — style anchors, chaining patterns, frame-level QA, montage assembly. Not for video analysis, research, provider settings, or FFmpeg encoding.

Galbaz1

content-media

media

19

image-generation

Enhances image generation prompts with Subject-Context-Style structure, style anchors, character consistency, mcp-image workflows. Not for video generation, TTS, FFmpeg, audio, or design-to-code.

Galbaz1

content-media

media

19

ffmpeg-production

FFmpeg video/audio processing — conversion, scaling, compression, trimming, concatenation, AI post-processing. Not for audio ducking/voice mixing (tts-production) or Remotion rendering.

Galbaz1

content-media

media

19

video-downloader-skill

Downloads videos and audio from YouTube, Bilibili, Twitter, and other platforms using yt-dlp. Supports quality selection, format conversion, and audio extraction.

Leoyishou

content-media

media

19

image-rotator

This skill should be used when users need to rotate images by 90 degrees. It handles image rotation tasks for common formats (PNG, JPG, JPEG, GIF, BMP, TIFF) using a reliable Python script that preserves image quality and supports both clockwise and counter-clockwise rotation.

amkessler

content-media

media

19

youtube-downloader

Download YouTube videos with customizable quality and format options. Use this skill when the user asks to download, save, or grab YouTube videos. Supports various quality settings (best, 1080p, 720p, 480p, 360p), multiple formats (mp4, webm, mkv), and audio-only downloads as MP3.

senweaver

content-media

media

19

nano-banana-pro

Generate or edit images via Gemini 3 Pro Image (Nano Banana Pro).

senweaver

content-media

media

19

video-frames

Extract frames or short clips from videos using ffmpeg.

senweaver

content-media

media

19

audio-transcriber

Transcribe audio and video files to text using OpenAI Whisper

scalyclaw

content-media

media

19

ffmpeg

Powerful multimedia processing tool for converting, recording, and streaming audio and video. Core Scenario: When the user needs to convert media formats, extract audio, or perform complex video editing via CLI.

x-cmd

content-media

media

19

digital-human

使用火山引擎OmniHuman1.5生成数字人视频,输入IP形象图片+音频,输出数字人说话视频。

Leoyishou

content-media

media

19

show-gallery

Universal media gallery — browse images/videos from any local folder with copy-path, enlarge, and video playback. Reusable across all gen projects.

ThepExcel

content-media

media

19

fal-ai-media

Unified media generation via fal.ai MCP — image, video, and audio. Covers text-to-image (Nano Banana), text/image-to-video (Seedance, Kling, Veo 3), text-to-speech (CSM-1B), and video-to-audio (ThinkSound). Use when the user wants to generate images, videos, or audio with AI.

Jamkris

content-media

media

19

minimax-multimodal-toolkit

MiniMax multimodal model skill — use MiniMax Multi-Modal models for speech, music, video, and image. Create voice, music, video, and images with MiniMax AI: TTS (text-to-speech, voice cloning, voice design, multi-segment), music (songs, instrumentals), video (text-to-video, image-to-video, start-end frame, subject reference, templates, long-form multi-scene), image (text-to-image, image-to-image with character reference), and media processing (convert, concat, trim, extract). Use when the user mentions MiniMax, multimodal generation, or wants speech/music/video/image AI, MiniMax APIs, or FFmpeg workflows alongside MiniMax outputs.

x-cmd

content-media

media

19

camsnap

Capture frames or clips from RTSP/ONVIF cameras.

senweaver

content-media

media

19

visual-inspection

Capture and understand camera images using the robot's head camera and VLM.

syswonder

content-media

Page 44 / 62