home/categories/media

category focus

Media

Audio, video, and image processing.

1476 个技能all categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

media

genmedia-audio-engineer

Expert in audio synthesis, music generation, and mixing. Use when creating podcasts, background scores, or multi-track audio layering using mcp-chirp3-go, mcp-lyria-go, mcp-gemini-go, mcp-nanobanana-go, and mcp-avtool-go.

GoogleCloudPlatform

content-media

open

media

genmedia-image-artist

Expert in AI image generation and editing. Use when the user needs high-quality textures, character-consistent visuals, or image-to-image editing using mcp-nanobanana-go.

GoogleCloudPlatform

content-media

open

media

genmedia-video-editor

Expert in video composition, editing, and format conversion. Use when the user wants to generate high-quality video, overlay images on video, concatenate clips, create GIFs, or sync audio to video using mcp-avtool-go and mcp-veo-go.

GoogleCloudPlatform

content-media

open

media

image-optimization-helper

Image Optimization Helper - Auto-activating skill for Frontend Development. Triggers on: image optimization helper, image optimization helper Part of the Frontend Development skill category.

jeremylongshore

content-media

open

media

987

voicemode-dj

Background music control for VoiceMode voice sessions using mpv

mbailey

content-media

open

media

946

image-ocr

Extract text content from images using Tesseract OCR via Python

benchflow-ai

content-media

open

media

946

ffmpeg-keyframe-extraction

Extract key frames (I-frames) from video files using FFmpeg command line tool. Use this skill when the user needs to pull out keyframes, thumbnails, or important frames from MP4, MKV, AVI, or other video formats for analysis, previews, or processing.

benchflow-ai

content-media

open

media

946

ffmpeg-audio-processing

Extract, normalize, mix, and process audio tracks - audio manipulation and analysis

benchflow-ai

content-media

open

media

946

ffmpeg-media-info

Analyze media file properties - duration, resolution, bitrate, codecs, and stream information

benchflow-ai

content-media

open

media

946

ffmpeg-video-filters

Apply video filters - scale, crop, watermark, speed, blur, and visual effects

benchflow-ai

content-media

open

media

946

ffmpeg-video-editing

Cut, trim, concatenate, and split video files - basic video editing operations

benchflow-ai

content-media

open

media

946

ffmpeg-format-conversion

Convert media files between formats - video containers, audio formats, and codec transcoding

benchflow-ai

content-media

open

media

946

image-editing

Comprehensive command-line tools for modifying and manipulating images, such as resize, blur, crop, flip, and many more.

benchflow-ai

content-media

open

media

946

video-frame-extraction

Extract frames from video files and save them as images using OpenCV

benchflow-ai

content-media

open

media

946

report-generator

Generate compression reports for video processing. Use when you need to create structured JSON reports with duration statistics, compression ratios, and segment details after video processing.

benchflow-ai

content-media

open

media

946

ffmpeg-video-editing

Video editing with ffmpeg including cutting, trimming, concatenating segments, and re-encoding. Use when working with video files (.mp4, .mkv, .avi) for: removing segments, joining clips, extracting portions, or any video manipulation task.

benchflow-ai

content-media

open

media

946

speech-to-text

Transcribe video to timestamped text using Whisper tiny model (pre-installed).

benchflow-ai

content-media

open

media

946

video-processor

Process videos by removing segments and concatenating remaining parts. Use when you need to remove detected pauses/openings from videos, create highlight reels, or batch process segment removals using ffmpeg filter_complex.

benchflow-ai

content-media

open

media

946

audio-extractor

Extract audio from video files to WAV format. Use when you need to analyze audio from video, prepare audio for energy calculation, or convert video audio to standard format for processing.

benchflow-ai

content-media

open

media

946

whisper-transcription

Transcribe audio/video to text with word-level timestamps using OpenAI Whisper. Use when you need speech-to-text with accurate timing information for each word.

benchflow-ai

content-media

open

media

946

filler-word-processing

Process filler word annotations to generate video edit lists. Use when working with timestamp annotations for removing speech disfluencies (um, uh, like, you know) from audio/video content.

benchflow-ai

content-media

open

media

946

multimodal-fusion-for-speaker-diarization

Combine visual features (face detection, lip movement analysis) with audio features to improve speaker diarization accuracy in video files. Use OpenCV for face detection and lip movement tracking, then fuse visual cues with audio-based speaker embeddings. Essential when processing video files with multiple visible speakers or when audio-only diarization needs visual validation.

benchflow-ai

content-media

open

media

946

automatic-speech-recognition-asr

Transcribe audio segments to text using Whisper models. Use larger models (small, base, medium, large-v3) for better accuracy, or faster-whisper for optimized performance. Always align transcription timestamps with diarization segments for accurate speaker-labeled subtitles.

benchflow-ai

content-media

open

media

946

gtts

Google Text-to-Speech (gTTS) for converting text to audio. Use when creating audiobooks, podcasts, or speech synthesis from text. Handles long text by chunking at sentence boundaries and concatenating audio segments with pydub.

benchflow-ai

content-media

open

Page 10 / 62