home/categories/media

category focus

Media

Audio, video, and image processing.

1476 스킬all categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

media

113

nano-banana-pro

Generate/edit images with Nano Banana Pro (Gemini 3 Pro Image). Use for image create/modify requests incl. edits. Supports text-to-image + image-to-image; 1K/2K/4K; use --input-image.

NeverSight

content-media

open

media

113

qwen-asr

Transcribe audio files using Qwen ASR. Use when the user sends voice messages and wants them converted to text.

NeverSight

content-media

open

media

113

asciinema-record

Record a terminal session to a named .cast file using asciinema, trim the recording to marked content, and optionally convert it to a GIF using agg.

NeverSight

content-media

open

media

113

bilibili-downloader

Download Bilibili videos. Extracts video and audio streams separately.

NeverSight

content-media

open

media

113

groq-stt

Transcribe audio files using Groq API (Whisper models). Use when user needs to transcribe audio to text.

NeverSight

content-media

open

media

113

Crops an image to specified dimensions around a focal point. Use when you need to extract a portion of an image, create thumbnails with custom positioning, or prepare images for specific aspect ratios.

NeverSight

content-media

open

media

113

videodb

See, Understand, Act on video and audio. See- ingest from local files, URLs, RTSP/live feeds, or live record desktop; return realtime context and playable stream links. Understand- extract frames, build visual/semantic/temporal indexes, and search moments with timestamps and auto-clips. Act- transcode and normalize (codec, fps, resolution, aspect ratio), perform timeline edits (subtitles, text/image overlays, branding, audio overlays, dubbing, translation), generate media assets (image, audio, video), and create real time alerts for events from live streams or desktop capture.

NeverSight

content-media

open

media

113

acestep-simplemv

Render music videos from audio files and lyrics using Remotion. Accepts audio + LRC/JSON lyrics + title to produce MP4 videos with waveform visualization and synced lyrics display. Use when users mention MV generation, music video rendering, creating video from audio/lyrics, or visualizing songs.

NeverSight

content-media

open

media

113

image-edit

Edits an existing image using a text prompt. Use when you need to modify, enhance, or transform an image based on text instructions.

NeverSight

content-media

open

media

113

image-convert

Converts an image to a different format (PNG, JPG, WebP). Use when you need to change image formats, optimize for web, or prepare images for specific applications.

NeverSight

content-media

open

media

113

image-to-video

Still-to-video conversion guide: model selection, motion prompting, and camera movement. Covers Wan 2.5 i2v, Seedance, Fabric, Grok Video with when to use each. Use for: animating images, creating video from stills, adding motion, product animations. Triggers: image to video, i2v, animate image, still to video, add motion to image, image animation, photo to video, animate still, wan i2v, image2video, bring image to life, animate photo, motion from image

NeverSight

content-media

open

media

113

image-remove-background

Removes the background from an image, leaving the foreground subject with transparency. Use when you need to isolate subjects, create cutouts, or prepare images for compositing.

NeverSight

content-media

open

media

113

promo-video

Create professional promotional videos using Remotion with AI voiceover and background music. Invoke with /promo-video.

buildatscale-tv

content-media

open

media

113

video-engineer

Expert in video processing, streaming protocols (HLS/DASH/WebRTC), and FFmpeg automation. Specializes in building scalable video infrastructure.

NeverSight

content-media

open

media

113

image-to-video

NeverSight

content-media

open

media

112

qwen-asr

Transcribe audio files using Qwen ASR. Use when the user sends voice messages and wants them converted to text.

aahl

content-media

open

media

109

funasr-transcribe

使用本地 FunASR 服务将音频或视频文件转录为带时间戳的 Markdown 文件，支持 mp4、mov、mp3、wav、m4a 等常见格式。本技能应在用户需要语音转文字、会议记录、视频字幕、播客转录时使用。

cat-xierluo

content-media

open

media

109

minimax-image-understand

通过 MiniMax MCP 进行图像理解，适用于 OpenClaw 平台。如果你是 Claude Code 用户，请忽略此技能。

cat-xierluo

content-media

open

media

107

image-optimization

Optimizes images for web performance using modern formats, responsive techniques, and lazy loading strategies. Use when improving page load times, implementing responsive images, or preparing assets for production deployment.

secondsky

content-media

open

media

106

m3u8-media-downloader

Use @lzwme/m3u8-dl for media download and video info parsing. Use when the user mentions video/music download (m3u8/HLS/mp4/mp3 or 抖音/皮皮虾/微博视频), or 获取视频信息、解析视频链接, and a video/music URL is present.

lzwme

content-media

open

media

106

using-youtube-download

Download YouTube video or audio with yt-dlp and ffmpeg at highest available quality.

besoeasy

content-media

open

media

105

c-video

Download videos, extract audio, convert formats, and clip segments using `yt-dlp` and `ffmpeg`. Supports YouTube, Vimeo, and hundreds of other sites.

daxaur

content-media

open

media

105

c-screen

Capture screenshots and extract text via OCR using `peekaboo`, and capture webcam images using `camsnap`. Enables visual analysis of screen content and camera input.

daxaur

content-media

open

media

105

video-understand

Understand video content locally using ffmpeg frame extraction and Whisper transcription. No API keys needed. Use when: (1) Understanding what a video contains, (2) Transcribing video audio locally, (3) Extracting key frames for visual analysis, (4) Getting video content without API keys.

heygen-com

content-media

open

Page 23 / 62