category focus

Media

Audio, video, and image processing.

1476 skillsall categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
media
75

camsnap

Capture frames or clips from RTSP/ONVIF cameras.

ArgentAIOS
ArgentAIOS
content-media
open
media
74

video-downloader

Download videos from 1000+ websites (YouTube, Bilibili, Twitter/X, TikTok, etc.) using yt-dlp. Use this skill when users provide video URLs and want to download videos, extract audio, or need help with video download issues.

isjiamu
isjiamu
content-media
open
media
74

gif-splitter

GIF动图切分工具,将超过指定帧数的GIF文件自动拆分成多个小文件。适用于微信公众号等平台上传GIF时遇到"帧数超限"的问题。

isjiamu
isjiamu
content-media
open
media
73

minimax-multimodal-toolkit

MiniMax-native multimodal workflow for image, video, voice, music, and media-processing tasks. Use when the user asks to generate image/video/audio assets, wants MiniMax-specific media APIs, needs TTS or voice workflows, wants reproducible local media outputs, or needs FFmpeg-style processing around generated media.

madebyaris
madebyaris
content-media
open
media
72

video-prompting

Draft and refine prompts for video generation models (text-to-video and image-to-video), and create character-sheet prompts for image models when the goal is character consistency before image-to-video. Use when a user asks for a "video prompt", a model-specific prompt such as Seedance 2.0, Ovi, Sora, Veo 3, Wan 2.2, LTX-2, or LTX-2.3, or a consistent-character prompt such as "character sheet prompt", "character turnaround", "character reference sheet", or "photographic identity sheet".

Square-Zero-Labs
Square-Zero-Labs
content-media
open
media
72

video-frames

Extract frames, thumbnails, or clips from video files using ffmpeg. Use when analyzing video content or creating previews.

emanueleielo
emanueleielo
content-media
open
media
72

camsnap

Capture snapshots, clips, or motion events from RTSP/ONVIF IP cameras via camsnap CLI.

emanueleielo
emanueleielo
content-media
open
media
72

camsnap

Capture frames or clips from RTSP/ONVIF cameras.

openaeon
openaeon
content-media
open
media
72

video-frames

Extract frames or short clips from videos using ffmpeg.

openaeon
openaeon
content-media
open
media
72

nano-banana-pro

Generate or edit images via Gemini 3 Pro Image (Nano Banana Pro).

openaeon
openaeon
content-media
open
media
69

omk-youtube

Extract and summarize YouTube video content via subtitle extraction. Trigger when user shares a YouTube URL (youtube.com or youtu.be), says 'summarize this video', 'watch this', '看看这个视频', or wants to understand video content without watching. Also trigger for video transcript extraction.

KaimingWan
KaimingWan
content-media
open
media
69

master-asset-converter

moorestech_masterリポジトリ内のPNGファイルを見つけてJPEGに変換し、アセット画像フォーマットを統一する。Use when: (1) moorestech_masterにPNG画像が混在している時 (2) 「PNGをJPEGに変換して」「画像フォーマットを統一して」と依頼された時 (3) 新しいアセット画像を追加した後にフォーマットを揃えたい時

moorestech
moorestech
content-media
open
media
65

image-batch

Batch process images for marketing. Use when: resizing images for social media; compressing images for web; removing backgrounds; adding watermarks; converting formats to WebP; optimizing for Core Web Vitals

guia-matthieu
guia-matthieu
content-media
open
media
65

whisper-transcription

Transcribe audio and video files to text using OpenAI Whisper. Use when: converting podcasts to blog posts; creating video subtitles; extracting quotes from interviews; repurposing video content to text; building searchable audio archives

guia-matthieu
guia-matthieu
content-media
open
media
65

video-processing

Process video files with ffmpeg automation. Use when: compressing videos for upload; extracting audio from video; resizing for social formats; clipping segments; merging multiple videos; generating thumbnails

guia-matthieu
guia-matthieu
content-media
open
media
65

videodb

See, Understand, Act on video and audio. See- ingest from local files, URLs, RTSP/live feeds, or live record desktop; return realtime context and playable stream links. Understand- extract frames, build visual/semantic/temporal indexes, and search moments with timestamps and auto-clips. Act- transcode and normalize (codec, fps, resolution, aspect ratio), perform timeline edits (subtitles, text/image overlays, branding, audio overlays, dubbing, translation), generate media assets (image, audio, video), and create real time alerts for events from live streams or desktop capture.

video-db
video-db
content-media
open
media
65

fennec-image-compression

Use this skill when asked to compress, resize, or analyze images in Go using the Fennec library, or when modifying the Fennec codebase itself.

shamspias
shamspias
content-media
open
media
65

audio-editing

Master the essential audio post-production techniques—normalization, compression, EQ, and noise reduction—using the correct processing order to achieve professional-quality audio. Use when: Editing podcast episodes or video soundtracks; Cleaning up recorded voiceovers; Improving audio quality for marketing content; Preparing audio files for distribution; Troubleshooting common audio issues

guia-matthieu
guia-matthieu
content-media
open
media
65

pydub-automation

Automate repetitive audio tasks with Python using PyDub for batch processing, format conversion, normalization, and content assembly. Use when: Processing large numbers of audio files consistently; Converting between audio formats at scale; Normalizing loudness across a batch of files; Assembling intros/outros automatically to episodes; Trimming silence or extracting segments programmatically

guia-matthieu
guia-matthieu
content-media
open
media
64

ai-multimodal

Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (better image analysis than Claude models, captioning, reasoning, object detection, design extraction, OCR, visual Q&A, segmentation, handle multiple images), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image with Imagen 4, editing, composition, refinement), generate videos (text-to-video with Veo 3, 8-second clips with native audio). Use when working with audio/video files, analyzing images or screenshots (instead of default vision capabilities of Claude, only fallback to Claude's vision capabilities if needed), processing PDF documents, extracting structured data from media, creating images/videos from text pr

The1Studio
The1Studio
content-media
open
media
64

media-processing

Process multimedia files with FFmpeg (video/audio encoding, conversion, streaming, filtering, hardware acceleration), ImageMagick (image manipulation, format conversion, batch processing, effects, composition), and RMBG (AI-powered background removal). Use when converting media formats, encoding videos with specific codecs (H.264, H.265, VP9), resizing/cropping images, removing backgrounds from images, extracting audio from video, applying filters and effects, optimizing file sizes, creating streaming manifests (HLS/DASH), generating thumbnails, batch processing images, creating composite images, or implementing media processing pipelines. Supports 100+ formats, hardware acceleration (NVENC, QSV), and complex filtergraphs.

The1Studio
The1Studio
content-media
open
media
63

acestep-simplemv

Render music videos from audio files and lyrics using Remotion. Accepts audio + LRC/JSON lyrics + title to produce MP4 videos with waveform visualization and synced lyrics display. Use when users mention MV generation, music video rendering, creating video from audio/lyrics, or visualizing songs.

ace-step
ace-step
content-media
open
media
63

acestep-lyrics-transcription

Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API. Outputs LRC, SRT, or JSON with word-level timestamps. Use when users want to transcribe songs, generate LRC files, or extract lyrics with timestamps from audio.

ace-step
ace-step
content-media
open
media
63

tts-script-generator

Intelligently compress and rewrite documents into TTS-friendly scripts. Uses Claude AI to analyze content, compress to target duration, convert to spoken style with emotional language, and auto-segment. Perfect for video narration.

huangserva
huangserva
content-media
open
Previous
Page 27 / 62
Next