audio-engineering-principles
Use for real-time audio code safety, determinism, and numeric hygiene. Required foundation for DSP, audio analysis, audio systems, and JUCE work. Not for game-audio middleware or ffmpeg/video tasks.
Use for real-time audio code safety, determinism, and numeric hygiene. Required foundation for DSP, audio analysis, audio systems, and JUCE work. Not for game-audio middleware or ffmpeg/video tasks.
Gemini 2.0 Flash APIを使用して音声ファイル(m4a, mp3, wav等)を日本語で文字起こしするスキル。講義録音や音声メモの書き起こしに使用。
Use for Godot game-audio and middleware integration — AudioBus routing, Wwise/FMOD events, adaptive music, and procedural sound systems. Not for DSP, JUCE, ffmpeg, or offline audio pipelines.
Face-swap YouTube thumbnails to feature Nick Saraev using AI. Use when user asks to recreate thumbnails, face swap images, generate YouTube thumbnails, or create thumbnail variations.
Full video production workflow for Remotion projects. Teaches how to orchestrate MCP tools (TTS, music, SFX, stock footage, video analysis) into complete Remotion compositions. Use this skill whenever producing a video that needs audio, voiceovers, music, stock footage, or analyzing existing video files.
Create 3D pan/swivel transition effects for videos using Remotion. Use when user asks to add 3D transitions, create swivel effects, or add video transitions.
Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image, editing, composition, refinement). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images from text prompts, or implementing multimodal AI features. Supports multiple models (Gemini 2.5/2.0) with context windows up to 2M tokens.
Download YouTube videos with customizable quality and format options. Use this skill when the user asks to download, save, or grab YouTube videos. Supports various quality settings (best, 1080p, 720p, 480p, 360p), multiple formats (mp4, webm, mkv), and audio-only downloads as MP3.
Improves the quality of images, especially screenshots, by enhancing resolution, sharpness, and clarity. Perfect for preparing images for 簡報s, 文檔ation, or social media posts.
Download videos, audio, playlists, and channels from YouTube and 1000+ websites using yt-dlp. Supports quality selection, format conversion, subtitle download, playlist filtering, metadata extraction, thumbnail download, and batch operations. Use when downloading YouTube videos in any quality (4K, 8K, HDR), extracting audio as MP3/M4A/FLAC, downloading entire playlists/channels, getting subtitles in multiple languages, converting to specific formats, downloading live streams, archiving content, or batch processing multiple URLs. Optimized for reliability with automatic retries, rate limiting, and error handling.
Fetch and download images from the internet in various formats (JPG, PNG, GIF, WebP, BMP, SVG, etc.). Use when users ask to download images, fetch images from URLs, save images from the web, or get images for embedding in documents or chats. Supports single and batch downloads with automatic format detection.
Process multimedia files with FFmpeg (video/audio encoding, conversion, streaming, filtering, hardware acceleration) and ImageMagick (image manipulation, format conversion, batch processing, effects, composition). Use when converting media formats, encoding videos with specific codecs (H.264, H.265, VP9), resizing/cropping images, extracting audio from video, applying filters and effects, optimizing file sizes, creating streaming manifests (HLS/DASH), generating thumbnails, batch processing images, creating composite images, or implementing media processing pipelines. Supports 100+ formats, hardware acceleration (NVENC, QSV), and complex filtergraphs.
Process raster data: clip by bounding box, stack multiple bands, mosaic GeoTIFFs, or convert between raster and vector formats.
YouTube transcript ingestion into the vault. Uses yt-dlp to fetch transcripts locally — no Docker, no MCP, no screen captures. Stores in research/videos/ with analysis_depth=transcript_only for future vision upgrade.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper); Don't use if you want local/offline transcription; prefer openai-whisper.
Digital pathology image processing toolkit for whole slide images (WSI). Use this skill when working with histopathology slides, processing H&E or IHC stained tissue images, extracting tiles from gigapixel pathology images, detecting tissue regions, segmenting tissue masks, or preparing datasets for computational pathology deep learning pipelines. Applies to WSI formats (SVS, TIFF, NDPI), tile-based analysis, and histological image preprocessing workflows.
MCP or full-pipeline video analysis — vault notes must match Obsidian standard. For local yt-dlp transcripts only, use g-skl-ingest-youtube.
Generates images, videos, audio, speech, and music using MuleRouter or MuleRun multimodal APIs. Text-to-Image, Image-to-Image, Text-to-Video, Image-to-Video, Reference-to-Video, Video-to-Video, video editing (VACE, keyframe interpolation), Text-to-Speech, Text-to-Music. Use when the user wants to generate, edit, or transform images, videos, speech, or music using AI models like Wan2.6, Veo3, Nano Banana Pro, Sora2, Midjourney, Kling V3, Kling V3 Omni, MiniMax Speech 2.8, MiniMax Music 2.5.
Turn a video into a premium scroll-driven animated website with GSAP, canvas frame rendering, and layered animation choreography. Use when the user wants to convert a video into an animated web experience.
Advanced motion designer with decades of After Effects and motion graphics experience, specialized in creating engaging video specifications for Remotion. Use when creating video specs, planning motion graphics, designing animations, or when asked to "create a video", "design motion graphics", "plan video content", or "spec out a video". Produces detailed scene-by-scene specifications with timing, audio, sound effects, and animation descriptions.
Master orchestrator that chains all Remotion video creation skills together in a single automated pipeline. Takes a creative brief and produces a complete, production-ready Remotion video project. Use when starting a new video from scratch, when asked to "create a video", "make a video", "build a complete video", or "video from idea to code".
YouTube transcript ingestion into the vault. Uses yt-dlp to fetch transcripts locally — no Docker, no MCP, no screen captures. Stores in research/videos/ with analysis_depth=transcript_only for future vision upgrade.