category focus

Media

Audio, video, and image processing.

1476 스킬all categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
media
113

nano-banana-pro

Generate/edit images with Nano Banana Pro (Gemini 3 Pro Image). Use for image create/modify requests incl. edits. Supports text-to-image + image-to-image; 1K/2K/4K; use --input-image.

NeverSight
NeverSight
content-media
open
media
113

qwen-asr

Transcribe audio files using Qwen ASR. Use when the user sends voice messages and wants them converted to text.

NeverSight
NeverSight
content-media
open
media
113

asciinema-record

Record a terminal session to a named .cast file using asciinema, trim the recording to marked content, and optionally convert it to a GIF using agg.

NeverSight
NeverSight
content-media
open
media
113

bilibili-downloader

Download Bilibili videos. Extracts video and audio streams separately.

NeverSight
NeverSight
content-media
open
media
113

groq-stt

Transcribe audio files using Groq API (Whisper models). Use when user needs to transcribe audio to text.

NeverSight
NeverSight
content-media
open
media
113

image-crop

Crops an image to specified dimensions around a focal point. Use when you need to extract a portion of an image, create thumbnails with custom positioning, or prepare images for specific aspect ratios.

NeverSight
NeverSight
content-media
open
media
113

videodb

See, Understand, Act on video and audio. See- ingest from local files, URLs, RTSP/live feeds, or live record desktop; return realtime context and playable stream links. Understand- extract frames, build visual/semantic/temporal indexes, and search moments with timestamps and auto-clips. Act- transcode and normalize (codec, fps, resolution, aspect ratio), perform timeline edits (subtitles, text/image overlays, branding, audio overlays, dubbing, translation), generate media assets (image, audio, video), and create real time alerts for events from live streams or desktop capture.

NeverSight
NeverSight
content-media
open
media
113

acestep-simplemv

Render music videos from audio files and lyrics using Remotion. Accepts audio + LRC/JSON lyrics + title to produce MP4 videos with waveform visualization and synced lyrics display. Use when users mention MV generation, music video rendering, creating video from audio/lyrics, or visualizing songs.

NeverSight
NeverSight
content-media
open
media
113

image-edit

Edits an existing image using a text prompt. Use when you need to modify, enhance, or transform an image based on text instructions.

NeverSight
NeverSight
content-media
open
media
113

image-convert

Converts an image to a different format (PNG, JPG, WebP). Use when you need to change image formats, optimize for web, or prepare images for specific applications.

NeverSight
NeverSight
content-media
open
media
113

image-to-video

Still-to-video conversion guide: model selection, motion prompting, and camera movement. Covers Wan 2.5 i2v, Seedance, Fabric, Grok Video with when to use each. Use for: animating images, creating video from stills, adding motion, product animations. Triggers: image to video, i2v, animate image, still to video, add motion to image, image animation, photo to video, animate still, wan i2v, image2video, bring image to life, animate photo, motion from image

NeverSight
NeverSight
content-media
open
media
113

image-remove-background

Removes the background from an image, leaving the foreground subject with transparency. Use when you need to isolate subjects, create cutouts, or prepare images for compositing.

NeverSight
NeverSight
content-media
open
media
113

promo-video

Create professional promotional videos using Remotion with AI voiceover and background music. Invoke with /promo-video.

buildatscale-tv
buildatscale-tv
content-media
open
media
113

video-engineer

Expert in video processing, streaming protocols (HLS/DASH/WebRTC), and FFmpeg automation. Specializes in building scalable video infrastructure.

NeverSight
NeverSight
content-media
open
media
113

image-to-video

Still-to-video conversion guide: model selection, motion prompting, and camera movement. Covers Wan 2.5 i2v, Seedance, Fabric, Grok Video with when to use each. Use for: animating images, creating video from stills, adding motion, product animations. Triggers: image to video, i2v, animate image, still to video, add motion to image, image animation, photo to video, animate still, wan i2v, image2video, bring image to life, animate photo, motion from image

NeverSight
NeverSight
content-media
open
media
112

qwen-asr

Transcribe audio files using Qwen ASR. Use when the user sends voice messages and wants them converted to text.

aahl
aahl
content-media
open
media
109

funasr-transcribe

使用本地 FunASR 服务将音频或视频文件转录为带时间戳的 Markdown 文件,支持 mp4、mov、mp3、wav、m4a 等常见格式。本技能应在用户需要语音转文字、会议记录、视频字幕、播客转录时使用。

cat-xierluo
cat-xierluo
content-media
open
media
109

minimax-image-understand

通过 MiniMax MCP 进行图像理解,适用于 OpenClaw 平台。如果你是 Claude Code 用户,请忽略此技能。

cat-xierluo
cat-xierluo
content-media
open
media
107

image-optimization

Optimizes images for web performance using modern formats, responsive techniques, and lazy loading strategies. Use when improving page load times, implementing responsive images, or preparing assets for production deployment.

secondsky
secondsky
content-media
open
media
106

m3u8-media-downloader

Use @lzwme/m3u8-dl for media download and video info parsing. Use when the user mentions video/music download (m3u8/HLS/mp4/mp3 or 抖音/皮皮虾/微博视频), or 获取视频信息、解析视频链接, and a video/music URL is present.

lzwme
lzwme
content-media
open
media
106

using-youtube-download

Download YouTube video or audio with yt-dlp and ffmpeg at highest available quality.

besoeasy
besoeasy
content-media
open
media
105

c-video

Download videos, extract audio, convert formats, and clip segments using `yt-dlp` and `ffmpeg`. Supports YouTube, Vimeo, and hundreds of other sites.

daxaur
daxaur
content-media
open
media
105

c-screen

Capture screenshots and extract text via OCR using `peekaboo`, and capture webcam images using `camsnap`. Enables visual analysis of screen content and camera input.

daxaur
daxaur
content-media
open
media
105

video-understand

Understand video content locally using ffmpeg frame extraction and Whisper transcription. No API keys needed. Use when: (1) Understanding what a video contains, (2) Transcribing video audio locally, (3) Extracting key frames for visual analysis, (4) Getting video content without API keys.

heygen-com
heygen-com
content-media
open
Previous
Page 23 / 62
Next