home/categories/media

category focus

Media

Audio, video, and image processing.

1476 skillsall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

media

117

animsequence

Preview, validate, bake, and manipulate animation sequences with constraint-aware bone editing

kevinpbuckley

content-media

open

media

117

hmt-export

Export simulation results and/or the computational mesh to VTK files for ParaView visualization. Use this when the user wants to export, visualize, or create VTK files.

psu-efd

content-media

open

media

116

bestblogs-transcribe-youtube

Use when 用户只想通过 Gemini Gem 浏览器流程转写单个 YouTube 视频，而不是走完整的 BestBlogs 视频处理流水线。

ginobefun

content-media

open

media

116

bestblogs-process-videos

Use when 用户想批量处理 BestBlogs 待分析视频，包括转录、分析、评分，以及按需翻译高分内容。

ginobefun

content-media

open

media

115

Guide for using ImageMagick command-line tools to perform advanced image processing tasks including format conversion, resizing, cropping, effects, transformations, and batch operations. Use when manipulating images programmatically via shell commands.

einverne

content-media

open

media

115

ffmpeg

Guide for using FFmpeg - a comprehensive multimedia framework for video/audio encoding, conversion, streaming, and filtering. Use when processing media files, converting formats, extracting audio, creating streams, applying filters, or optimizing video/audio quality.

einverne

content-media

open

media

115

maui-media-picker

Guidance for picking photos/videos, capturing from camera, multi-select (.NET 10), MediaPickerOptions, platform permissions, and FileResult handling in .NET MAUI. USE FOR: "pick photo", "capture photo", "take picture", "pick video", "camera capture", "MediaPicker", "photo gallery", "image picker", "multi-select photos", "MediaPickerOptions". DO NOT USE FOR: general file picking (use maui-file-handling), image display or optimization (use maui-performance), or camera streaming (use maui-platform-invoke).

davidortinau

content-media

open

media

115

video-audio-design

Use this skill when adding audio to programmatic videos - generating narration with ElevenLabs TTS, sourcing royalty-free background music, creating SFX with FFmpeg, implementing audio ducking, or mixing multiple audio layers in Remotion. Triggers on ElevenLabs, text-to-speech, voice generation, background music, sound effects, audio mixing, and volume ducking.

AbsolutelySkilled

content-media

open

media

115

video-analyzer

Use this skill when analyzing existing video files using FFmpeg and AI vision, extracting frames for design system generation, detecting scene boundaries, analyzing animation timing, extracting color palettes, or understanding audio-visual sync. Triggers on video analysis, frame extraction, scene detection, ffprobe, motion analysis, and AI vision analysis of video content.

AbsolutelySkilled

content-media

open

media

114

youtube-transcript-extractor

Extracts timestamped transcripts from YouTube videos for translation, summarization, and content creation.

QwenLM

content-media

open

media

114

veo-use

Create and edit videos using Google's Veo 2 and Veo 3 models. Supports Text-to-Video, Image-to-Video, Reference-to-Video, Inpainting, and Video Extension. Available parameters: prompt, image, mask, mode, duration, aspect-ratio. Always confirm parameters with the user or explicitly state defaults before running.

cnemri

content-media

open

media

114

veo-build

Create and edit videos using Google's Veo 2 and Veo 3 models. Supports Text-to-Video, Image-to-Video, Inpainting, and Advanced Controls.

cnemri

content-media

open

media

114

processing-images

Image processing toolkit awareness. Use when: user uploads images for manipulation, requests format conversion, batch processing, compositing, resizing, optimization, analysis, effects, metadata inspection, montages, animated GIFs, color correction, or any image-related task. Also use when working with screenshots, photos, diagrams, icons, or visual assets. Triggers on 'resize', 'crop', 'convert', 'compress', 'optimize', 'thumbnail', 'watermark', 'montage', 'collage', 'gif', 'sprite sheet', 'color space', 'metadata', 'EXIF', 'compare images', 'diff', 'overlay', 'composite', 'batch process', 'image analysis', 'histogram', 'blur', 'sharpen', 'rotate', 'flip', 'border', 'shadow', 'round corners', 'favicon', 'icon set'.

oaustegard

content-media

open

media

114

processing-video

Audio and video processing with ffmpeg. Use when: user asks to convert, trim, merge, compress, or transcode video or audio files; extract audio from video; create GIFs or animated WebP from video; add subtitles or watermarks to video; change video resolution, framerate, or codec; normalize audio loudness; extract frames from video; concatenate clips; create thumbnails from video; strip or add audio tracks; convert between audio formats (MP3, AAC, FLAC, Opus, WAV); adjust volume; apply video filters; stabilize shaky video; generate waveform or spectrum visualizations; probe media file metadata. Triggers on 'ffmpeg', 'video', 'audio', 'transcode', 'MP4', 'MKV', 'WebM', 'MP3', 'AAC', 'FLAC', 'Opus', 'WAV', 'GIF from video', 'extract audio', 'add subtitles', 'video to gif', 'compress video', 'trim video', 'merge videos', 'normalize audio', 'framerate', 'resolution', 'bitrate', 'codec', 'ffprobe', 'waveform', 'spectrogram'.

oaustegard

content-media

open

media

114

seeing-images

Augmented vision tools for analyzing images beyond native visual capabilities. Use when tasked with describing images in detail, reproducing images as SVGs, identifying subtle features, comparing image regions, reading degraded text, or any task requiring careful visual inspection. Also use when the image-to-svg skill needs ground truth about colors, shapes, or boundaries.

oaustegard

content-media

open

media

113

agent-media

Agent-first media toolkit for image, video, and audio processing. Use when you need to resize, convert, generate images, remove backgrounds, extract audio, transcribe speech, or generate videos. All commands return deterministic JSON output.

NeverSight

content-media

open

media

113

image-resize

Resizes an image to specified dimensions. Use when you need to change image size, create thumbnails, or prepare images for specific display requirements.

NeverSight

content-media

open

media

113

audio-extract

Extracts audio track from a video file. Use when you need to get audio from video, prepare audio for transcription, or separate audio from video content. Runs locally with no API key required.

NeverSight

content-media

open

media

113

image-upscale

Upscales an image using AI super-resolution to increase resolution with detail generation. Use when you need to enlarge images, improve low-resolution photos, or prepare images for large-format display.

NeverSight

content-media

open

media

113

pan-3d-transition

Create 3D pan/swivel transition effects for videos using Remotion. Use when user asks to add 3D transitions, create swivel effects, or add video transitions.

NeverSight

content-media

open

media

113

deepgram-transcription

Transcribe audio and video files using the Deepgram API. This skill should be used when the user requests transcription of audio files (mp3, wav, m4a, aac) or video files (mp4, mov, avi, etc.). Handles large video files by extracting audio first to reduce upload size and processing time.

NeverSight

content-media

open

media

113

video-podcast-maker

Use when user provides a topic and wants an automated video podcast created - handles research, script writing, TTS audio synthesis, Remotion video creation, and final MP4 output with background music

NeverSight

content-media

open

media

113

video-edit

Complete video editing toolkit - silence removal, auto-captions, vertical crop, YouTube clipping, 3D transitions, and social media compression. Use when user asks to edit video, remove silences, add captions/subtitles, crop to vertical/shorts, download YouTube clips, compress video, or create video teasers.

NeverSight

content-media

open

media

113

audio-transcribe

Transcribes audio to text with timestamps and optional speaker identification. Use when you need to convert speech to text, create subtitles, transcribe meetings, or process voice recordings.

NeverSight

content-media

open

Page 22 / 62