home/categories/media

category focus

Media

Audio, video, and image processing.

1476 スキルall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

media

3d-performance

Performance optimization for 3D web scenes — LOD strategies, frustum/occlusion culling, draw call reduction, and R3F-specific optimizations. Use when scenes run below 60fps.

wrm3

content-media

open

media

download-image

A robust, "Honey Badger" image downloader that bypasses strict CDNs, ignores misleading Content-Types, and falls back to scraping if a direct link fails.

Ethereal-Lemons

content-media

open

media

Send files and media directly to WhatsApp conversations.

Ethereal-Lemons

content-media

open

media

Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (better image analysis than Claude models, captioning, reasoning, object detection, design extraction, OCR, visual Q&A, segmentation, handle multiple images), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image with Imagen 4, editing, composition, refinement), generate videos (text-to-video with Veo 3, 8-second clips with native audio). Use when working with audio/video files, analyzing images or screenshots (instead of default vision capabilities of Claude, only fallback to Claude's vision capabilities if needed), processing PDF documents, extracting structured data from media, creating images/videos from text pr

kevinnguyen271090

content-media

open

media

photo-composition-critic

Expert photography composition critic grounded in graduate-level visual aesthetics education, computational aesthetics research (AVA, NIMA, LAION-Aesthetics, VisualQuality-R1), and professional image analysis with custom tooling. Use for image quality assessment, composition analysis, aesthetic scoring, photo critique. Activate on "photo critique", "composition analysis", "image aesthetics", "NIMA", "AVA dataset", "visual quality". NOT for photo editing/retouching (use native-app-designer), generating images (use Stability AI directly), or basic image processing (use clip-aware-embeddings).

erichowens

content-media

open

media

youtube-to-docs

Comprehensive suite for processing YouTube videos. Use this when the user needs to: (1) Extract transcripts, (2) Generate visual infographics, (3) Create audio summaries (TTS) and videos, or (4) Perform full 'kitchen sink' processing of YouTube content.

DoIT-Artificial-Intelligence

content-media

open

media

2000s-visualization-expert

Expert in 2000s-era music visualization (Milkdrop, AVS, Geiss) and modern WebGL implementations. Specializes in Butterchurn integration, Web Audio API AnalyserNode FFT data, GLSL shaders for audio-reactive visuals, and psychedelic generative art. Activate on "Milkdrop", "music visualization", "WebGL visualizer", "Butterchurn", "audio reactive", "FFT visualization", "spectrum analyzer". NOT for simple bar charts/waveforms (use basic canvas), video editing, or non-audio visuals.

erichowens

content-media

open

media

video-processing-editing

FFmpeg automation for cutting, trimming, concatenating videos. Audio mixing, timeline editing, transitions, effects. Export optimization for YouTube, social media. Subtitle handling, color grading, batch processing. Use for videogen projects, content creation, automated video production. Activate on "video editing", "FFmpeg", "trim video", "concatenate", "transitions", "export optimization". NOT for real-time video editing UI, 3D compositing, or motion graphics.

erichowens

content-media

open

media

photo-content-recognition-curation-expert

Expert in photo content recognition, intelligent curation, and quality filtering. Specializes in face/animal/place recognition, perceptual hashing for de-duplication, screenshot/meme detection, burst photo selection, and quick indexing strategies. Activate on 'face recognition', 'face clustering', 'perceptual hash', 'near-duplicate', 'burst photo', 'screenshot detection', 'photo curation', 'photo indexing', 'NSFW detection', 'pet recognition', 'DINOHash', 'HDBSCAN faces'. NOT for GPS-based location clustering (use event-detection-temporal-intelligence-expert), color palette extraction (use color-theory-palette-harmony-expert), semantic image-text matching (use clip-aware-embeddings), or video analysis/frame extraction.

erichowens

content-media

open

media

summarize-youtube

High-quality YouTube video summarization using the local summarize CLI with yt-dlp + whisper.cpp transcription and Claude CLI by default. Use when asked to summarize a YouTube video, extract a transcript from audio (not captions), or run a repeatable best-quality video summary workflow.

joshp123

content-media

open

media

analyze

Analyze indexed videos to understand their content. Use when user asks "what is this video about?", "summarize the video", "analyze this video", or has questions about video content.

twelvelabs-io

content-media

open

media

embed

Create, check, and retrieve video embeddings. Use when user wants to generate embeddings from a video, check embedding status, or retrieve embeddings for a video.

twelvelabs-io

content-media

open

media

image-search

Search videos using an image combined with text for refined results. Use when the user wants to search with a reference image, says "find videos matching this image", "search with this picture", or wants to combine visual reference with text description.

twelvelabs-io

content-media

open

media

index

Index video files or URLs for AI search and analysis. Use when user wants to add a video for processing, says "index this video", "add this video for analysis", or mentions a video file they want to work with.

twelvelabs-io

content-media

open

media

search

Search indexed videos using natural language. Use when the user wants to find specific content, moments, or scenes in their indexed videos. Triggers on phrases like "find the part where...", "search for...", "look for...", "where does...", "when does...".

twelvelabs-io

content-media

open

media

instagram-transcriber-yandex-cloud-ig-yandex

Транскрибирует аудио из Instagram Reels (и других видео) через Yandex SpeechKit. Скачивает видео, загружает в Object Storage, распознает речь и возвращает текст.

tpitsunov

content-media

open

media

youtube-downloader

Download videos from YouTube and 1000+ other sites using yt-dlp

crazynomad

content-media

open

media

twitter-downloader

Download videos from X (Twitter) posts using twmd — a fast, API-less downloader

crazynomad

content-media

open

media

vhs

VHS terminal recording best practices from Charmbracelet (formerly charmbracelet-vhs). This skill should be used when writing, reviewing, or editing VHS tape files to create professional terminal GIFs and videos. Triggers on tasks involving .tape files, VHS configuration, terminal recording, demo creation, or CLI documentation.

connorads

content-media

open

media

nano-banana

Generate and edit images using Google's Gemini image models (Nano Banana 2 default, Nano Banana Pro legacy). Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports text-to-image, image editing with up to 14 reference images, configurable resolution (0.5K-4K), aspect ratio, and adjustable thinking. DO NOT read the image file first - use this skill directly with the --input-image parameter.

connorads

content-media

open

media

camsnap

Capture frames or clips from RTSP/ONVIF cameras.

hashSTACS-Global

content-media

open

media

harmix-music-search

Search for music tracks using natural language prompts or video content via the Harmix AI API. Use when the user asks to find music, search for songs, discover tracks, needs music recommendations based on mood/genre/tempo/scene, or wants to find music for a video, video soundtrack, or music that matches video content.