category focus

Media

Audio, video, and image processing.

1476 स्किल्सall categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
media
22

tts

Text-to-speech — convert text to audio using gTTS or edge-tts. Use when: user asks to read text aloud, generate an audio file from text, or create a voiceover. NOT for: speech-to-text/transcription (use Deepgram or whisper), music generation, or audio editing.

ericwang915
ericwang915
content-media
open
media
21

media-processing

Process multimedia files with FFmpeg (video/audio encoding, conversion, streaming, filtering, hardware acceleration), ImageMagick (image manipulation, format conversion, batch processing, effects, composition), and RMBG (AI-powered background removal). Use when converting media formats, encoding videos with specific codecs (H.264, H.265, VP9), resizing/cropping images, removing backgrounds from images, extracting audio from video, applying filters and effects, optimizing file sizes, creating streaming manifests (HLS/DASH), generating thumbnails, batch processing images, creating composite images, or implementing media processing pipelines. Supports 100+ formats, hardware acceleration (NVENC, QSV), and complex filtergraphs.

binhmuc
binhmuc
content-media
open
media
21

xstranscriber

音声ファイルをテキストに文字起こしするスキル。mp3/wav/m4a/ogg/flac形式に対応。whisperベースのtranscriber_toolを使用し、tiny/base/small/medium/largeの5つのモデルから精度と速度のバランスを選択可能。長時間音声はバックグラウンド実行に対応。「文字起こしして」「音声をテキストに変換して」で使用。

karaage0703
karaage0703
content-media
open
media
21

optimizing-io-operations

Optimizes standard I/O and file operations for high-performance data processing in .NET. Use when building high-throughput file processing or competitive programming solutions.

christian289
christian289
content-media
open
media
21

gif-compress

GIFファイルを圧縮し、プログレスバーを追加する

shuntaka9576
shuntaka9576
content-media
open
media
21

ai-multimodal

Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (better image analysis than Claude models, captioning, reasoning, object detection, design extraction, OCR, visual Q&A, segmentation, handle multiple images), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image with Imagen 4, editing, composition, refinement), generate videos (text-to-video with Veo 3, 8-second clips with native audio). Use when working with audio/video files, analyzing images or screenshots (instead of default vision capabilities of Claude, only fallback to Claude's vision capabilities if needed), processing PDF documents, extracting structured data from media, creating images/videos from text pr

xthanhn91
xthanhn91
content-media
open
media
21

video-frames

Extract frames or short clips from videos using ffmpeg.

kaivyy
kaivyy
content-media
open
media
21

nano-banana-pro

Generate or edit images via Gemini 3 Pro Image (Nano Banana Pro).

kaivyy
kaivyy
content-media
open
media
21

camsnap

Capture frames or clips from RTSP/ONVIF cameras.

kaivyy
kaivyy
content-media
open
media
21

compress-images

Compress images for web/SEO performance using cwebp. Use when optimizing images for faster page loads, reducing file sizes, or converting JPG/PNG to WebP format.

rameerez
rameerez
content-media
open
media
21

omnicaptions-convert

Use when converting between caption formats (SRT, VTT, ASS, TTML, Gemini MD, etc.). Supports 30+ caption formats.

lattifai
lattifai
content-media
open
media
21

youtube-data-api

YouTube Data API v3 complete wrapper - search videos, get video/channel/playlist details, get comments, download subtitles, etc. Supports filtering and sorting by time/views/rating and more.

Yrzhe
Yrzhe
content-media
open
media
21

transcribe-video

Generate subtitles (SRT/VTT) and plain text transcripts from video or audio files using AWS Transcribe. Use when creating captions, extracting spoken content, generating transcripts for notes, or making video content searchable.

rameerez
rameerez
content-media
open
media
21

download-video

Download videos from social media URLs (X/Twitter, YouTube, Instagram, TikTok, etc.) using yt-dlp. Use when saving a video locally, extracting content for transcription, or archiving video references.

rameerez
rameerez
content-media
open
media
21

omnicaptions-transcribe

Use when transcribing audio/video to text with timestamps, speaker labels, and chapters. Supports YouTube URLs and local files. Produces structured markdown output.

lattifai
lattifai
content-media
open
media
21

omnicaptions-laicut

Use when user needs accurate/precise caption timing, or aligning captions with audio/video using forced alignment. Corrects caption timing to match actual speech. Uses LattifAI Lattice-1 model.

lattifai
lattifai
content-media
open
media
21

omnicaptions-download

Use when downloading videos, audio, or captions from YouTube and other video platforms. Supports quality selection.

lattifai
lattifai
content-media
open
media
20

file-converter

Convert & transform files - images (resize, format, HEIC), markdown (PDF/HTML), data (CSV/JSON/YAML/TOML/XML), SVG, base64, text encoding. Cross-platform, single & batch mode. This skill should be used when converting file formats, resizing images, generating PDFs from markdown, or transforming data between formats.

georgekhananaev
georgekhananaev
content-media
open
media
20

youmind-youtube-transcript

Extract and summarize YouTube video transcripts via YouMind API — no yt-dlp, no proxy needed. Batch extract up to 5 videos at once with parallel processing. Saves videos to your YouMind board with timestamped transcripts in markdown. Automatically summarizes video content after extraction. Works from any IP (cloud, VPS, CI/CD, corporate networks). Use when user wants to "get YouTube transcript", "extract video subtitles", "transcribe YouTube video", "batch transcribe videos", "get video captions", "summarize YouTube video", "YouTube video summary", "summarize this video", "what does this video say", "YouTube 字幕", "YouTube 总结", "视频总结", "YouTube 文字起こし", "YouTube 자막", or "download YouTube transcript".

YouMind-OpenLab
YouMind-OpenLab
content-media
open
media
20

yt-dlp

Download videos, extract audio, and get transcripts from YouTube and 1000+ sites. Use when asked to download a video, extract audio, get a transcript, rip subtitles, or fetch media. Triggers on "download video", "yt-dlp", "extract audio", "YouTube download", "get transcript", "subtitles", or any media download request.

juanibiapina
juanibiapina
content-media
open
media
20

video-editing

Automated video editing skill for talk/vlog/standup videos. Use when: cutting video, splitting video into sentences, merging video clips, extracting audio, transcribing speech, auto-editing oral presentation videos, combining selected sentence clips into a final video, generating video cover/thumbnail with title, B-roll cutaway editing, persistent video overlay/watermark, blinking REC indicator, ending title cards, multi-source audio mixing, exporting to JianYing/CapCut project for further editing, generating voiceover videos with Remotion (audio-only to video with animated visuals/subtitles). Requires ffmpeg and whisper. Remotion workflow additionally requires Node.js and npm.

maxazure
maxazure
content-media
open
media
20

oral-history-tools

Use this Skill to process oral history recordings: Whisper transcription with timestamps, pyannote speaker diarization, OHMS metadata XML, and speaker anonymization.

xjtulyc
xjtulyc
content-media
open
media
20

compression-codecs

Configure and optimize compression for Zarr arrays. Covers all numcodecs compressors (Blosc, Zstd, LZ4, Gzip, LZMA, BZ2), pre-compression filters (Delta, Quantize, FixedScaleOffset, PackBits), codec pipelines, Blosc thread safety, and the trade-offs between compression speed and ratio.

uw-ssec
uw-ssec
content-media
open
media
20

seo-images

Image optimization analysis for SEO and performance. Checks alt text, file sizes, formats, responsive images, lazy loading, and CLS prevention. Use when user says "image optimization", "alt text", "image SEO", "image size", or "image audit".

WebDevPeterGriffin
WebDevPeterGriffin
content-media
open
Previous
Page 43 / 62
Next