skills.homescapability registry Поиск

home/categories/media

category focus

Media

Audio, video, and image processing.

1476 skillsall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

media

22

tts

Text-to-speech — convert text to audio using gTTS or edge-tts. Use when: user asks to read text aloud, generate an audio file from text, or create a voiceover. NOT for: speech-to-text/transcription (use Deepgram or whisper), music generation, or audio editing.

ericwang915

content-media

media

21

media-processing

Process multimedia files with FFmpeg (video/audio encoding, conversion, streaming, filtering, hardware acceleration), ImageMagick (image manipulation, format conversion, batch processing, effects, composition), and RMBG (AI-powered background removal). Use when converting media formats, encoding videos with specific codecs (H.264, H.265, VP9), resizing/cropping images, removing backgrounds from images, extracting audio from video, applying filters and effects, optimizing file sizes, creating streaming manifests (HLS/DASH), generating thumbnails, batch processing images, creating composite images, or implementing media processing pipelines. Supports 100+ formats, hardware acceleration (NVENC, QSV), and complex filtergraphs.

binhmuc

content-media

media

21

xstranscriber

音声ファイルをテキストに文字起こしするスキル。mp3/wav/m4a/ogg/flac形式に対応。whisperベースのtranscriber_toolを使用し、tiny/base/small/medium/largeの5つのモデルから精度と速度のバランスを選択可能。長時間音声はバックグラウンド実行に対応。「文字起こしして」「音声をテキストに変換して」で使用。

karaage0703

content-media

media

21

optimizing-io-operations

Optimizes standard I/O and file operations for high-performance data processing in .NET. Use when building high-throughput file processing or competitive programming solutions.

christian289

content-media

media

21

gif-compress

GIFファイルを圧縮し、プログレスバーを追加する

shuntaka9576

content-media

media

21

ai-multimodal

Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (better image analysis than Claude models, captioning, reasoning, object detection, design extraction, OCR, visual Q&A, segmentation, handle multiple images), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image with Imagen 4, editing, composition, refinement), generate videos (text-to-video with Veo 3, 8-second clips with native audio). Use when working with audio/video files, analyzing images or screenshots (instead of default vision capabilities of Claude, only fallback to Claude's vision capabilities if needed), processing PDF documents, extracting structured data from media, creating images/videos from text pr

xthanhn91

content-media

media

21

video-frames

Extract frames or short clips from videos using ffmpeg.

kaivyy

content-media

media

21

nano-banana-pro

Generate or edit images via Gemini 3 Pro Image (Nano Banana Pro).

kaivyy

content-media

media

21

camsnap

Capture frames or clips from RTSP/ONVIF cameras.

kaivyy

content-media

media

21

compress-images

Compress images for web/SEO performance using cwebp. Use when optimizing images for faster page loads, reducing file sizes, or converting JPG/PNG to WebP format.

rameerez

content-media

media

21

omnicaptions-convert

Use when converting between caption formats (SRT, VTT, ASS, TTML, Gemini MD, etc.). Supports 30+ caption formats.

lattifai

content-media

media

21

youtube-data-api

YouTube Data API v3 complete wrapper - search videos, get video/channel/playlist details, get comments, download subtitles, etc. Supports filtering and sorting by time/views/rating and more.

Yrzhe

content-media

media

21

transcribe-video

Generate subtitles (SRT/VTT) and plain text transcripts from video or audio files using AWS Transcribe. Use when creating captions, extracting spoken content, generating transcripts for notes, or making video content searchable.

rameerez

content-media

media

21

download-video

Download videos from social media URLs (X/Twitter, YouTube, Instagram, TikTok, etc.) using yt-dlp. Use when saving a video locally, extracting content for transcription, or archiving video references.

rameerez

content-media

media

21

omnicaptions-transcribe

Use when transcribing audio/video to text with timestamps, speaker labels, and chapters. Supports YouTube URLs and local files. Produces structured markdown output.

lattifai

content-media

media

21

omnicaptions-laicut

Use when user needs accurate/precise caption timing, or aligning captions with audio/video using forced alignment. Corrects caption timing to match actual speech. Uses LattifAI Lattice-1 model.

lattifai

content-media

media

21

omnicaptions-download

Use when downloading videos, audio, or captions from YouTube and other video platforms. Supports quality selection.

lattifai

content-media

media

20

file-converter

Convert & transform files - images (resize, format, HEIC), markdown (PDF/HTML), data (CSV/JSON/YAML/TOML/XML), SVG, base64, text encoding. Cross-platform, single & batch mode. This skill should be used when converting file formats, resizing images, generating PDFs from markdown, or transforming data between formats.

georgekhananaev

content-media

media

20

youmind-youtube-transcript

Extract and summarize YouTube video transcripts via YouMind API — no yt-dlp, no proxy needed. Batch extract up to 5 videos at once with parallel processing. Saves videos to your YouMind board with timestamped transcripts in markdown. Automatically summarizes video content after extraction. Works from any IP (cloud, VPS, CI/CD, corporate networks). Use when user wants to "get YouTube transcript", "extract video subtitles", "transcribe YouTube video", "batch transcribe videos", "get video captions", "summarize YouTube video", "YouTube video summary", "summarize this video", "what does this video say", "YouTube 字幕", "YouTube 总结", "视频总结", "YouTube 文字起こし", "YouTube 자막", or "download YouTube transcript".

YouMind-OpenLab

content-media

media

20

yt-dlp

Download videos, extract audio, and get transcripts from YouTube and 1000+ sites. Use when asked to download a video, extract audio, get a transcript, rip subtitles, or fetch media. Triggers on "download video", "yt-dlp", "extract audio", "YouTube download", "get transcript", "subtitles", or any media download request.

juanibiapina

content-media

media

20

video-editing

Automated video editing skill for talk/vlog/standup videos. Use when: cutting video, splitting video into sentences, merging video clips, extracting audio, transcribing speech, auto-editing oral presentation videos, combining selected sentence clips into a final video, generating video cover/thumbnail with title, B-roll cutaway editing, persistent video overlay/watermark, blinking REC indicator, ending title cards, multi-source audio mixing, exporting to JianYing/CapCut project for further editing, generating voiceover videos with Remotion (audio-only to video with animated visuals/subtitles). Requires ffmpeg and whisper. Remotion workflow additionally requires Node.js and npm.

maxazure

content-media

media

20

oral-history-tools

Use this Skill to process oral history recordings: Whisper transcription with timestamps, pyannote speaker diarization, OHMS metadata XML, and speaker anonymization.

xjtulyc

content-media

media

20

compression-codecs

Configure and optimize compression for Zarr arrays. Covers all numcodecs compressors (Blosc, Zstd, LZ4, Gzip, LZMA, BZ2), pre-compression filters (Delta, Quantize, FixedScaleOffset, PackBits), codec pipelines, Blosc thread safety, and the trade-offs between compression speed and ratio.

uw-ssec

content-media

media

20

seo-images

Image optimization analysis for SEO and performance. Checks alt text, file sizes, formats, responsive images, lazy loading, and CLS prevention. Use when user says "image optimization", "alt text", "image SEO", "image size", or "image audit".

WebDevPeterGriffin

content-media

Page 43 / 62