skills.homescapability registry 搜索

home/categories/media

category focus

Media

Audio, video, and image processing.

1476 个技能all categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

media

29

sharp

Process and transform images with Sharp for Node.js. Use when a user asks to resize images, convert image formats (WebP, AVIF, PNG, JPEG), compress images, crop or rotate photos, generate thumbnails, add watermarks, optimize images for web, batch process images, create responsive image variants, extract image metadata, or build image processing pipelines. Covers resizing, format conversion, compression, cropping, compositing, and metadata extraction.

TerminalSkills

content-media

media

29

sox

Process audio files with SoX (Sound eXchange). Use when a user asks to apply audio effects, mix and combine audio tracks, convert audio formats, batch process audio files, normalize volume, trim silence, add reverb or echo, change tempo or pitch, split audio files, create spectrograms, generate test tones, resample audio, or build audio processing pipelines. Covers all SoX effects, format conversion, mixing, and batch workflows.

TerminalSkills

content-media

media

29

yt-dlp

Download video and audio from YouTube and other platforms with yt-dlp. Use when a user asks to download YouTube videos, extract audio from videos, download playlists, get subtitles, download specific formats or qualities, batch download, archive channels, extract metadata, embed thumbnails, download from social media platforms (Twitter, Instagram, TikTok), or build media ingestion pipelines. Covers format selection, audio extraction, playlists, subtitles, metadata, and automation.

TerminalSkills

content-media

media

29

ffmpeg

Transcode, convert, edit, and process audio and video with FFmpeg. Use when a user asks to convert video formats (webm to mp4, mkv to mp4), extract audio from video, compress video files, trim or cut clips, concatenate videos, add subtitles, create thumbnails, apply filters, change resolution or bitrate, re-encode media, create GIFs from video, add watermarks, normalize audio, stream media, or build automated media processing pipelines.

TerminalSkills

content-media

media

29

file-upload-processor

When the user needs to build file upload functionality for a web application. Use when the user mentions "file upload," "image upload," "upload endpoint," "multipart upload," "presigned URL," "S3 upload," "file validation," "upload to cloud storage," or "accept user files." Handles upload endpoints, file validation (type, size, magic bytes), cloud storage integration, and upload status tracking. For image/video processing after upload, see media-transcoder.

TerminalSkills

content-media

media

29

audiowaveform

Generate waveform visualizations from audio files. Use when a user asks to create waveform images, build audio player visualizations, generate waveform data for web players, create podcast episode previews, build audio thumbnails, render waveform PNGs for social media, extract peak data as JSON, or integrate waveform generation into audio processing pipelines. Covers audiowaveform CLI, JSON/binary data output, and web player integration.

TerminalSkills

content-media

media

29

moviepy

Edit and compose video with Python using MoviePy. Use when a user asks to programmatically edit videos, create video montages, add text overlays, build automated video pipelines, composite multiple clips, apply video effects, generate social media videos from templates, concatenate clips, extract audio, create GIFs, build slideshows, add transitions, resize and crop videos, or integrate video editing into Python applications. Covers MoviePy 2.x for compositing, effects, text, and rendering.

TerminalSkills

content-media

media

29

opencv

You are an expert in OpenCV (Open Source Computer Vision Library), the most popular library for real-time computer vision. You help developers build image processing pipelines, object detection systems, video analysis tools, augmented reality, and document processing using OpenCV's 2,500+ algorithms for image manipulation, feature detection, camera calibration, 3D reconstruction, and DNN inference — in Python, C++, or JavaScript.

TerminalSkills

content-media

media

28

video-edit

Edit talking-head videos by removing silences with neural VAD and adding 3D swivel teaser transitions. Use when user asks to edit video, remove silences, add jump cuts, or create video teasers.

MalekAG

content-media

media

28

pan-3d-transition

Create 3D pan/swivel transition effects for videos using Remotion. Use when user asks to add 3D transitions, create swivel effects, or add video transitions.

MalekAG

content-media

media

28

talking-head-editor

口播 / talking-head 视频精剪 skill。适用于单人出镜口播、采访、自述、课程讲解、播客切条等以 A-roll 说话为主的视频。Use when the user asks for “剪口播”, “删停顿”, “去废话”, “删语气词”, “jump cut”, “按字幕精剪”, “根据 transcript 粗剪”, “采访精剪”, “单人口播提速”, or “把这段 talking head 剪紧凑”.

linyqh

content-media

media

28

ffmpeg-best-practice

Lean FFmpeg playbook for reliable video compression, WeChat-compatible MP4 export, clip stitching, audio mixing, subtitle handling, and quick fallback decisions. Use for requests like “压缩视频”, “微信上传”, “体积太大”, “导出更小但别糊”, “多段素材拼接”, “字幕烧录”, and “ffmpeg 报错”.

linyqh

content-media

media

28

camsnap

摄像头截图技能，从RTSP/ONVIF摄像头获取截图和视频

DotNetAge

content-media

media

28

songsee

音频频谱可视化技能，从音频文件生成频谱图和可视化

DotNetAge

content-media

media

28

video-query

Analyzes video files using Google Gemini API and answers questions about content. Triggers on keywords: video query, analyze video, video analysis, query video

WaterplanAI

content-media

media

27

anycap-media-production

Produce media assets using AnyCap: generate images, videos, and music from text or reference inputs, refine images through interactive visual annotation, and deliver finished assets. Covers the full production workflow from concept to delivery across all media types (image, video, music, audio). Use when creating images, videos, music, or any visual/audio content -- including iterative refinement with human feedback. Also use for image-to-image transformation, video generation from images, and annotation-driven precise edits. Trigger on: media production, asset generation, generate image/video/music, create visual content, produce assets, iterative image editing, annotate and refine, creative workflow, content creation, or any task requiring AI-generated media output.

anycap-ai

content-media

media

27

remove-photo-metadata

Strip all metadata from images for privacy

zocomputer

content-media

media

27

resize-image

Resize images to specific dimensions or percentages

zocomputer

content-media

media

27

convert-heic-to-jpeg

Convert HEIC or HEIF images to JPEG with `uv run` using inline script dependencies. Use this when new phone photos are added under `pictures/` and need to be converted before indexing or upload.

Azure-Samples

content-media

media

27

video-frames

Extract frames or short clips from videos using ffmpeg.

zocomputer

content-media

media

26

image-optimization

Image optimization with Next.js 15 Image, AVIF/WebP formats, blur placeholders, responsive sizes, and CDN loaders

yonatangross

content-media

media

26

asr

Implement speech-to-text (ASR/automatic speech recognition) capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to transcribe audio files, convert speech to text, build voice input features, or process audio recordings. Supports base64 encoded audio files and returns accurate text transcriptions.

AnswerZhao

content-media

media

26

transcribe-anything

Transcribes audio and video files to text using pluggable ASR backends. Default backend is local whisper CLI (openai-whisper). Supports whisperX (with diarization), insanely-fast-whisper, faster-whisper, whisper.cpp, OpenAI Whisper API, Groq Whisper API, Deepgram, AssemblyAI, Gemini, and Hugging Face models. Handles very long files (1-8+ hours) by preprocessing with ffmpeg: extracts audio from video, converts to optimal ASR format, detects and skips silence, and chunks for API size limits. Supports speaker diarization, word-level timestamps, custom vocabulary, and multiple output formats. Use this skill when someone says "transcribe this", "convert to text", "speech to text", "get the transcript", "transcribe this video/audio/podcast/recording", or provides a media file and wants text output.

swyxio

content-media

media

26

thumbnail-extraction

Extracts the most interesting frames from video files for thumbnail compositing. Detects faces, expressions, smiles, and presentation slides. Outputs full frames, face crops, and transparent cutouts. Use when asked to extract thumbnails, find interesting frames, grab screenshots from video, or create thumbnail candidates from recordings.

swyxio

content-media

Page 37 / 62