home/categories/media

category focus

Media

Audio, video, and image processing.

1476 skillsall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

media

camsnap

Capture frames or clips from RTSP/ONVIF cameras.

ArgentAIOS

content-media

open

media

Download videos from 1000+ websites (YouTube, Bilibili, Twitter/X, TikTok, etc.) using yt-dlp. Use this skill when users provide video URLs and want to download videos, extract audio, or need help with video download issues.

isjiamu

content-media

open

media

gif-splitter

GIF动图切分工具，将超过指定帧数的GIF文件自动拆分成多个小文件。适用于微信公众号等平台上传GIF时遇到"帧数超限"的问题。

isjiamu

content-media

open

media

minimax-multimodal-toolkit

MiniMax-native multimodal workflow for image, video, voice, music, and media-processing tasks. Use when the user asks to generate image/video/audio assets, wants MiniMax-specific media APIs, needs TTS or voice workflows, wants reproducible local media outputs, or needs FFmpeg-style processing around generated media.

madebyaris

content-media

open

media

video-prompting

Draft and refine prompts for video generation models (text-to-video and image-to-video), and create character-sheet prompts for image models when the goal is character consistency before image-to-video. Use when a user asks for a "video prompt", a model-specific prompt such as Seedance 2.0, Ovi, Sora, Veo 3, Wan 2.2, LTX-2, or LTX-2.3, or a consistent-character prompt such as "character sheet prompt", "character turnaround", "character reference sheet", or "photographic identity sheet".

Square-Zero-Labs

content-media

open

media

video-frames

Extract frames, thumbnails, or clips from video files using ffmpeg. Use when analyzing video content or creating previews.

emanueleielo

content-media

open

media

camsnap

Capture snapshots, clips, or motion events from RTSP/ONVIF IP cameras via camsnap CLI.

emanueleielo

content-media

open

media

camsnap

Capture frames or clips from RTSP/ONVIF cameras.

openaeon

content-media

open

media

video-frames

Extract frames or short clips from videos using ffmpeg.

openaeon

content-media

open

media

nano-banana-pro

Generate or edit images via Gemini 3 Pro Image (Nano Banana Pro).

openaeon

content-media

open

media

omk-youtube

Extract and summarize YouTube video content via subtitle extraction. Trigger when user shares a YouTube URL (youtube.com or youtu.be), says 'summarize this video', 'watch this', '看看这个视频', or wants to understand video content without watching. Also trigger for video transcript extraction.

KaimingWan

content-media

open

media

master-asset-converter

moorestech_masterリポジトリ内のPNGファイルを見つけてJPEGに変換し、アセット画像フォーマットを統一する。Use when: (1) moorestech_masterにPNG画像が混在している時 (2) 「PNGをJPEGに変換して」「画像フォーマットを統一して」と依頼された時 (3) 新しいアセット画像を追加した後にフォーマットを揃えたい時

moorestech

content-media

open

media

image-batch

Batch process images for marketing. Use when: resizing images for social media; compressing images for web; removing backgrounds; adding watermarks; converting formats to WebP; optimizing for Core Web Vitals

guia-matthieu

content-media

open

media

whisper-transcription

Transcribe audio and video files to text using OpenAI Whisper. Use when: converting podcasts to blog posts; creating video subtitles; extracting quotes from interviews; repurposing video content to text; building searchable audio archives

guia-matthieu

content-media

open

media

video-processing

Process video files with ffmpeg automation. Use when: compressing videos for upload; extracting audio from video; resizing for social formats; clipping segments; merging multiple videos; generating thumbnails

guia-matthieu

content-media

open

media

videodb

See, Understand, Act on video and audio. See- ingest from local files, URLs, RTSP/live feeds, or live record desktop; return realtime context and playable stream links. Understand- extract frames, build visual/semantic/temporal indexes, and search moments with timestamps and auto-clips. Act- transcode and normalize (codec, fps, resolution, aspect ratio), perform timeline edits (subtitles, text/image overlays, branding, audio overlays, dubbing, translation), generate media assets (image, audio, video), and create real time alerts for events from live streams or desktop capture.

video-db

content-media

open

media

fennec-image-compression

Use this skill when asked to compress, resize, or analyze images in Go using the Fennec library, or when modifying the Fennec codebase itself.

shamspias

content-media

open

media

audio-editing

Master the essential audio post-production techniques—normalization, compression, EQ, and noise reduction—using the correct processing order to achieve professional-quality audio. Use when: Editing podcast episodes or video soundtracks; Cleaning up recorded voiceovers; Improving audio quality for marketing content; Preparing audio files for distribution; Troubleshooting common audio issues

guia-matthieu

content-media

open

media

pydub-automation

Automate repetitive audio tasks with Python using PyDub for batch processing, format conversion, normalization, and content assembly. Use when: Processing large numbers of audio files consistently; Converting between audio formats at scale; Normalizing loudness across a batch of files; Assembling intros/outros automatically to episodes; Trimming silence or extracting segments programmatically

guia-matthieu

content-media

open

media

ai-multimodal

Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (better image analysis than Claude models, captioning, reasoning, object detection, design extraction, OCR, visual Q&A, segmentation, handle multiple images), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image with Imagen 4, editing, composition, refinement), generate videos (text-to-video with Veo 3, 8-second clips with native audio). Use when working with audio/video files, analyzing images or screenshots (instead of default vision capabilities of Claude, only fallback to Claude's vision capabilities if needed), processing PDF documents, extracting structured data from media, creating images/videos from text pr

The1Studio

content-media

open

media

media-processing

Process multimedia files with FFmpeg (video/audio encoding, conversion, streaming, filtering, hardware acceleration), ImageMagick (image manipulation, format conversion, batch processing, effects, composition), and RMBG (AI-powered background removal). Use when converting media formats, encoding videos with specific codecs (H.264, H.265, VP9), resizing/cropping images, removing backgrounds from images, extracting audio from video, applying filters and effects, optimizing file sizes, creating streaming manifests (HLS/DASH), generating thumbnails, batch processing images, creating composite images, or implementing media processing pipelines. Supports 100+ formats, hardware acceleration (NVENC, QSV), and complex filtergraphs.

The1Studio

content-media

open

media

acestep-simplemv

Render music videos from audio files and lyrics using Remotion. Accepts audio + LRC/JSON lyrics + title to produce MP4 videos with waveform visualization and synced lyrics display. Use when users mention MV generation, music video rendering, creating video from audio/lyrics, or visualizing songs.

ace-step

content-media

open

media

acestep-lyrics-transcription

Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API. Outputs LRC, SRT, or JSON with word-level timestamps. Use when users want to transcribe songs, generate LRC files, or extract lyrics with timestamps from audio.

ace-step

content-media

open

media

tts-script-generator

Intelligently compress and rewrite documents into TTS-friendly scripts. Uses Claude AI to analyze content, compress to target duration, convert to spoken style with emotional language, and auto-segment. Perfect for video narration.

huangserva

content-media

open

Page 27 / 62