category focus

Media

Audio, video, and image processing.

1476 个技能all categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
media
1K

genmedia-audio-engineer

Expert in audio synthesis, music generation, and mixing. Use when creating podcasts, background scores, or multi-track audio layering using mcp-chirp3-go, mcp-lyria-go, mcp-gemini-go, mcp-nanobanana-go, and mcp-avtool-go.

GoogleCloudPlatform
GoogleCloudPlatform
content-media
open
media
1K

genmedia-image-artist

Expert in AI image generation and editing. Use when the user needs high-quality textures, character-consistent visuals, or image-to-image editing using mcp-nanobanana-go.

GoogleCloudPlatform
GoogleCloudPlatform
content-media
open
media
1K

genmedia-video-editor

Expert in video composition, editing, and format conversion. Use when the user wants to generate high-quality video, overlay images on video, concatenate clips, create GIFs, or sync audio to video using mcp-avtool-go and mcp-veo-go.

GoogleCloudPlatform
GoogleCloudPlatform
content-media
open
media
1K

image-optimization-helper

Image Optimization Helper - Auto-activating skill for Frontend Development. Triggers on: image optimization helper, image optimization helper Part of the Frontend Development skill category.

jeremylongshore
jeremylongshore
content-media
open
media
987

voicemode-dj

Background music control for VoiceMode voice sessions using mpv

mbailey
mbailey
content-media
open
media
946

image-ocr

Extract text content from images using Tesseract OCR via Python

benchflow-ai
benchflow-ai
content-media
open
media
946

ffmpeg-keyframe-extraction

Extract key frames (I-frames) from video files using FFmpeg command line tool. Use this skill when the user needs to pull out keyframes, thumbnails, or important frames from MP4, MKV, AVI, or other video formats for analysis, previews, or processing.

benchflow-ai
benchflow-ai
content-media
open
media
946

ffmpeg-audio-processing

Extract, normalize, mix, and process audio tracks - audio manipulation and analysis

benchflow-ai
benchflow-ai
content-media
open
media
946

ffmpeg-media-info

Analyze media file properties - duration, resolution, bitrate, codecs, and stream information

benchflow-ai
benchflow-ai
content-media
open
media
946

ffmpeg-video-filters

Apply video filters - scale, crop, watermark, speed, blur, and visual effects

benchflow-ai
benchflow-ai
content-media
open
media
946

ffmpeg-video-editing

Cut, trim, concatenate, and split video files - basic video editing operations

benchflow-ai
benchflow-ai
content-media
open
media
946

ffmpeg-format-conversion

Convert media files between formats - video containers, audio formats, and codec transcoding

benchflow-ai
benchflow-ai
content-media
open
media
946

image-editing

Comprehensive command-line tools for modifying and manipulating images, such as resize, blur, crop, flip, and many more.

benchflow-ai
benchflow-ai
content-media
open
media
946

report-generator

Generate compression reports for video processing. Use when you need to create structured JSON reports with duration statistics, compression ratios, and segment details after video processing.

benchflow-ai
benchflow-ai
content-media
open
media
946

ffmpeg-video-editing

Video editing with ffmpeg including cutting, trimming, concatenating segments, and re-encoding. Use when working with video files (.mp4, .mkv, .avi) for: removing segments, joining clips, extracting portions, or any video manipulation task.

benchflow-ai
benchflow-ai
content-media
open
media
946

speech-to-text

Transcribe video to timestamped text using Whisper tiny model (pre-installed).

benchflow-ai
benchflow-ai
content-media
open
media
946

video-processor

Process videos by removing segments and concatenating remaining parts. Use when you need to remove detected pauses/openings from videos, create highlight reels, or batch process segment removals using ffmpeg filter_complex.

benchflow-ai
benchflow-ai
content-media
open
media
946

audio-extractor

Extract audio from video files to WAV format. Use when you need to analyze audio from video, prepare audio for energy calculation, or convert video audio to standard format for processing.

benchflow-ai
benchflow-ai
content-media
open
media
946

whisper-transcription

Transcribe audio/video to text with word-level timestamps using OpenAI Whisper. Use when you need speech-to-text with accurate timing information for each word.

benchflow-ai
benchflow-ai
content-media
open
media
946

filler-word-processing

Process filler word annotations to generate video edit lists. Use when working with timestamp annotations for removing speech disfluencies (um, uh, like, you know) from audio/video content.

benchflow-ai
benchflow-ai
content-media
open
media
946

multimodal-fusion-for-speaker-diarization

Combine visual features (face detection, lip movement analysis) with audio features to improve speaker diarization accuracy in video files. Use OpenCV for face detection and lip movement tracking, then fuse visual cues with audio-based speaker embeddings. Essential when processing video files with multiple visible speakers or when audio-only diarization needs visual validation.

benchflow-ai
benchflow-ai
content-media
open
media
946

automatic-speech-recognition-asr

Transcribe audio segments to text using Whisper models. Use larger models (small, base, medium, large-v3) for better accuracy, or faster-whisper for optimized performance. Always align transcription timestamps with diarization segments for accurate speaker-labeled subtitles.

benchflow-ai
benchflow-ai
content-media
open
media
946

gtts

Google Text-to-Speech (gTTS) for converting text to audio. Use when creating audiobooks, podcasts, or speech synthesis from text. Handles long text by chunking at sentence boundaries and concatenating audio segments with pydub.

benchflow-ai
benchflow-ai
content-media
open
Previous
Page 10 / 62
Next