category focus

Media

Audio, video, and image processing.

1476 スキルall categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
media
14

3d-performance

Performance optimization for 3D web scenes — LOD strategies, frustum/occlusion culling, draw call reduction, and R3F-specific optimizations. Use when scenes run below 60fps.

wrm3
wrm3
content-media
open
media
14

download-image

A robust, "Honey Badger" image downloader that bypasses strict CDNs, ignores misleading Content-Types, and falls back to scraping if a direct link fails.

Ethereal-Lemons
Ethereal-Lemons
content-media
open
media
14

whatsapp

Send files and media directly to WhatsApp conversations.

Ethereal-Lemons
Ethereal-Lemons
content-media
open
media
13

ai-multimodal

Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (better image analysis than Claude models, captioning, reasoning, object detection, design extraction, OCR, visual Q&A, segmentation, handle multiple images), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image with Imagen 4, editing, composition, refinement), generate videos (text-to-video with Veo 3, 8-second clips with native audio). Use when working with audio/video files, analyzing images or screenshots (instead of default vision capabilities of Claude, only fallback to Claude's vision capabilities if needed), processing PDF documents, extracting structured data from media, creating images/videos from text pr

kevinnguyen271090
kevinnguyen271090
content-media
open
media
13

photo-composition-critic

Expert photography composition critic grounded in graduate-level visual aesthetics education, computational aesthetics research (AVA, NIMA, LAION-Aesthetics, VisualQuality-R1), and professional image analysis with custom tooling. Use for image quality assessment, composition analysis, aesthetic scoring, photo critique. Activate on "photo critique", "composition analysis", "image aesthetics", "NIMA", "AVA dataset", "visual quality". NOT for photo editing/retouching (use native-app-designer), generating images (use Stability AI directly), or basic image processing (use clip-aware-embeddings).

erichowens
erichowens
content-media
open
media
13

youtube-to-docs

Comprehensive suite for processing YouTube videos. Use this when the user needs to: (1) Extract transcripts, (2) Generate visual infographics, (3) Create audio summaries (TTS) and videos, or (4) Perform full 'kitchen sink' processing of YouTube content.

DoIT-Artificial-Intelligence
DoIT-Artificial-Intelligence
content-media
open
media
13

2000s-visualization-expert

Expert in 2000s-era music visualization (Milkdrop, AVS, Geiss) and modern WebGL implementations. Specializes in Butterchurn integration, Web Audio API AnalyserNode FFT data, GLSL shaders for audio-reactive visuals, and psychedelic generative art. Activate on "Milkdrop", "music visualization", "WebGL visualizer", "Butterchurn", "audio reactive", "FFT visualization", "spectrum analyzer". NOT for simple bar charts/waveforms (use basic canvas), video editing, or non-audio visuals.

erichowens
erichowens
content-media
open
media
13

video-processing-editing

FFmpeg automation for cutting, trimming, concatenating videos. Audio mixing, timeline editing, transitions, effects. Export optimization for YouTube, social media. Subtitle handling, color grading, batch processing. Use for videogen projects, content creation, automated video production. Activate on "video editing", "FFmpeg", "trim video", "concatenate", "transitions", "export optimization". NOT for real-time video editing UI, 3D compositing, or motion graphics.

erichowens
erichowens
content-media
open
media
13

photo-content-recognition-curation-expert

Expert in photo content recognition, intelligent curation, and quality filtering. Specializes in face/animal/place recognition, perceptual hashing for de-duplication, screenshot/meme detection, burst photo selection, and quick indexing strategies. Activate on 'face recognition', 'face clustering', 'perceptual hash', 'near-duplicate', 'burst photo', 'screenshot detection', 'photo curation', 'photo indexing', 'NSFW detection', 'pet recognition', 'DINOHash', 'HDBSCAN faces'. NOT for GPS-based location clustering (use event-detection-temporal-intelligence-expert), color palette extraction (use color-theory-palette-harmony-expert), semantic image-text matching (use clip-aware-embeddings), or video analysis/frame extraction.

erichowens
erichowens
content-media
open
media
13

summarize-youtube

High-quality YouTube video summarization using the local summarize CLI with yt-dlp + whisper.cpp transcription and Claude CLI by default. Use when asked to summarize a YouTube video, extract a transcript from audio (not captions), or run a repeatable best-quality video summary workflow.

joshp123
joshp123
content-media
open
media
13

analyze

Analyze indexed videos to understand their content. Use when user asks "what is this video about?", "summarize the video", "analyze this video", or has questions about video content.

twelvelabs-io
twelvelabs-io
content-media
open
media
13

embed

Create, check, and retrieve video embeddings. Use when user wants to generate embeddings from a video, check embedding status, or retrieve embeddings for a video.

twelvelabs-io
twelvelabs-io
content-media
open
media
13

image-search

Search videos using an image combined with text for refined results. Use when the user wants to search with a reference image, says "find videos matching this image", "search with this picture", or wants to combine visual reference with text description.

twelvelabs-io
twelvelabs-io
content-media
open
media
13

index

Index video files or URLs for AI search and analysis. Use when user wants to add a video for processing, says "index this video", "add this video for analysis", or mentions a video file they want to work with.

twelvelabs-io
twelvelabs-io
content-media
open
media
13

search

Search indexed videos using natural language. Use when the user wants to find specific content, moments, or scenes in their indexed videos. Triggers on phrases like "find the part where...", "search for...", "look for...", "where does...", "when does...".

twelvelabs-io
twelvelabs-io
content-media
open
media
13

instagram-transcriber-yandex-cloud-ig-yandex

Транскрибирует аудио из Instagram Reels (и других видео) через Yandex SpeechKit. Скачивает видео, загружает в Object Storage, распознает речь и возвращает текст.

tpitsunov
tpitsunov
content-media
open
media
13

youtube-downloader

Download videos from YouTube and 1000+ other sites using yt-dlp

crazynomad
crazynomad
content-media
open
media
13

twitter-downloader

Download videos from X (Twitter) posts using twmd — a fast, API-less downloader

crazynomad
crazynomad
content-media
open
media
13

vhs

VHS terminal recording best practices from Charmbracelet (formerly charmbracelet-vhs). This skill should be used when writing, reviewing, or editing VHS tape files to create professional terminal GIFs and videos. Triggers on tasks involving .tape files, VHS configuration, terminal recording, demo creation, or CLI documentation.

connorads
connorads
content-media
open
media
13

nano-banana

Generate and edit images using Google's Gemini image models (Nano Banana 2 default, Nano Banana Pro legacy). Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports text-to-image, image editing with up to 14 reference images, configurable resolution (0.5K-4K), aspect ratio, and adjustable thinking. DO NOT read the image file first - use this skill directly with the --input-image parameter.

connorads
connorads
content-media
open
media
13

camsnap

Capture frames or clips from RTSP/ONVIF cameras.

hashSTACS-Global
hashSTACS-Global
content-media
open
media
13

harmix-music-search

Search for music tracks using natural language prompts or video content via the Harmix AI API. Use when the user asks to find music, search for songs, discover tracks, needs music recommendations based on mood/genre/tempo/scene, or wants to find music for a video, video soundtrack, or music that matches video content.

Harmix
Harmix
content-media
open
media
13

chanjing-video-compose

Use Chanjing video synthesis APIs to create digital human videos from text or audio, with optional background upload, polling, and explicit download.

chanjing-ai
chanjing-ai
content-media
open
media
13

gif-maker

将序列帧图片或精灵表(Sprite Sheet)转换为高质量 GIF 动画。支持自定义 FPS、布局切分及循环播放。

guanyang
guanyang
content-media
open
Previous
Page 49 / 62
Next