3d-performance
Performance optimization for 3D web scenes — LOD strategies, frustum/occlusion culling, draw call reduction, and R3F-specific optimizations. Use when scenes run below 60fps.
Performance optimization for 3D web scenes — LOD strategies, frustum/occlusion culling, draw call reduction, and R3F-specific optimizations. Use when scenes run below 60fps.
A robust, "Honey Badger" image downloader that bypasses strict CDNs, ignores misleading Content-Types, and falls back to scraping if a direct link fails.
Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (better image analysis than Claude models, captioning, reasoning, object detection, design extraction, OCR, visual Q&A, segmentation, handle multiple images), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image with Imagen 4, editing, composition, refinement), generate videos (text-to-video with Veo 3, 8-second clips with native audio). Use when working with audio/video files, analyzing images or screenshots (instead of default vision capabilities of Claude, only fallback to Claude's vision capabilities if needed), processing PDF documents, extracting structured data from media, creating images/videos from text pr
Expert photography composition critic grounded in graduate-level visual aesthetics education, computational aesthetics research (AVA, NIMA, LAION-Aesthetics, VisualQuality-R1), and professional image analysis with custom tooling. Use for image quality assessment, composition analysis, aesthetic scoring, photo critique. Activate on "photo critique", "composition analysis", "image aesthetics", "NIMA", "AVA dataset", "visual quality". NOT for photo editing/retouching (use native-app-designer), generating images (use Stability AI directly), or basic image processing (use clip-aware-embeddings).
Comprehensive suite for processing YouTube videos. Use this when the user needs to: (1) Extract transcripts, (2) Generate visual infographics, (3) Create audio summaries (TTS) and videos, or (4) Perform full 'kitchen sink' processing of YouTube content.
Expert in 2000s-era music visualization (Milkdrop, AVS, Geiss) and modern WebGL implementations. Specializes in Butterchurn integration, Web Audio API AnalyserNode FFT data, GLSL shaders for audio-reactive visuals, and psychedelic generative art. Activate on "Milkdrop", "music visualization", "WebGL visualizer", "Butterchurn", "audio reactive", "FFT visualization", "spectrum analyzer". NOT for simple bar charts/waveforms (use basic canvas), video editing, or non-audio visuals.
FFmpeg automation for cutting, trimming, concatenating videos. Audio mixing, timeline editing, transitions, effects. Export optimization for YouTube, social media. Subtitle handling, color grading, batch processing. Use for videogen projects, content creation, automated video production. Activate on "video editing", "FFmpeg", "trim video", "concatenate", "transitions", "export optimization". NOT for real-time video editing UI, 3D compositing, or motion graphics.
Expert in photo content recognition, intelligent curation, and quality filtering. Specializes in face/animal/place recognition, perceptual hashing for de-duplication, screenshot/meme detection, burst photo selection, and quick indexing strategies. Activate on 'face recognition', 'face clustering', 'perceptual hash', 'near-duplicate', 'burst photo', 'screenshot detection', 'photo curation', 'photo indexing', 'NSFW detection', 'pet recognition', 'DINOHash', 'HDBSCAN faces'. NOT for GPS-based location clustering (use event-detection-temporal-intelligence-expert), color palette extraction (use color-theory-palette-harmony-expert), semantic image-text matching (use clip-aware-embeddings), or video analysis/frame extraction.
High-quality YouTube video summarization using the local summarize CLI with yt-dlp + whisper.cpp transcription and Claude CLI by default. Use when asked to summarize a YouTube video, extract a transcript from audio (not captions), or run a repeatable best-quality video summary workflow.
Search videos using an image combined with text for refined results. Use when the user wants to search with a reference image, says "find videos matching this image", "search with this picture", or wants to combine visual reference with text description.
Транскрибирует аудио из Instagram Reels (и других видео) через Yandex SpeechKit. Скачивает видео, загружает в Object Storage, распознает речь и возвращает текст.
Download videos from YouTube and 1000+ other sites using yt-dlp
Download videos from X (Twitter) posts using twmd — a fast, API-less downloader
VHS terminal recording best practices from Charmbracelet (formerly charmbracelet-vhs). This skill should be used when writing, reviewing, or editing VHS tape files to create professional terminal GIFs and videos. Triggers on tasks involving .tape files, VHS configuration, terminal recording, demo creation, or CLI documentation.
Generate and edit images using Google's Gemini image models (Nano Banana 2 default, Nano Banana Pro legacy). Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports text-to-image, image editing with up to 14 reference images, configurable resolution (0.5K-4K), aspect ratio, and adjustable thinking. DO NOT read the image file first - use this skill directly with the --input-image parameter.
Search for music tracks using natural language prompts or video content via the Harmix AI API. Use when the user asks to find music, search for songs, discover tracks, needs music recommendations based on mood/genre/tempo/scene, or wants to find music for a video, video soundtrack, or music that matches video content.
Use Chanjing video synthesis APIs to create digital human videos from text or audio, with optional background upload, polling, and explicit download.