media-understand
使用 AI 理解和分析多媒体内容(图片、视频、音频)。Use when user wants to 理解图片, 分析视频, 音频转文字, 视频问答, understand media, analyze video, transcribe audio, describe image, what is in this video/image/audio.
使用 AI 理解和分析多媒体内容(图片、视频、音频)。Use when user wants to 理解图片, 分析视频, 音频转文字, 视频问答, understand media, analyze video, transcribe audio, describe image, what is in this video/image/audio.
AI生成日本語の違和感(AI臭)を検出・解消し、人間らしい自然な文章に脱臭するスキル。「この文章を脱臭して」「AI臭を消して」「人間らしい文章にして」「自然な日本語に直して」「翻訳調を修正」などのリクエストで起動する。プロンプトエンジニアリングによる予防とポストエディティングによる治療の両方をサポート。
Use when user requests translating Qt project localization files (TS files), automating translation workflows, or setting up multilingual support for Qt applications. This skill uses parallel processing with ThreadPoolExecutor to translate TS (Translation Source) files efficiently.
Transform textbook content based on the 10-dimension user profile to provide personalized learning experiences. Agent: AIEngineer
This skill is ALWAYS ACTIVE once installed. Automatically applies classical Chinese (文言文) writing style to all responses. Uses concise, elegant expressions while keeping technical terms intact. No trigger phrase needed - activates on every response.
Configure WaveCap LLM-based transcription correction. Use when the user wants to enable/disable LLM correction, change models, tune prompts, or optimize correction quality on Apple Silicon.
Document ingestion pipeline - docs to chunks to metadata for RAG
Process audio, video, and media on cloud GPUs. Transcribe with Whisper, clone voices, generate videos, upscale images, and run batch media processing. All results sync back to your Mac.
Configure WaveCap hallucination detection and prevention. Use when Whisper outputs gibberish, repeated phrases, or phantom text on silent audio.
Generate a Sora video from a text prompt via an Azure OpenAI endpoint, then download the resulting .mp4 locally. Use when the user asks to generate a Sora video/video.mp4 from a prompt or wants the generated video saved to disk.
Analyze videos using Google's Gemini API - describe content, answer questions, transcribe audio with visual descriptions, reference timestamps, clip videos, and process YouTube URLs. Supports 9 video formats, multiple models (Gemini 2.5/2.0), and context windows up to 2M tokens (6 hours of video).
Generates images and videos using ComfyUI node-based workflows. Use when creating AI-generated assets, text-to-image, text-to-video, image-to-video, running Stable Diffusion, Flux, HunyuanVideo, or when user mentions "comfy," "ComfyUI," "generate image," "generate video," "AI art," "diffusion model," or needs visual content for courses/projects.
Fine-tuning Speech-to-Text models like Whisper using Unsloth's optimized LoRA pipeline. Triggers: stt, whisper, transcription, audio fine-tuning, speech-to-text, audio normalization.
Combine multiple images using Gemini 2.5 Flash (Nano Banana) via OpenRouter. Use when merging 2-8 images with AI-guided composition.
This skill should be used when the user asks to "generate video with LTX-2", "create video from text", "animate an image", "use GAMMA for video generation", or needs local AI video generation on the GAMMA GPU server. Covers text-to-video, image-to-video, and auto-cataloging workflows with the LTX-2 19B model.
Generate and edit images using Google Gemini's Nano Banana image generation API via undyapi.com proxy. Use when users want to generate images from text descriptions, edit existing images with AI, or create visual content. Supports aspect ratio and quality control.
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
Fix for Plotly FigureWidget KeyError: 'uid' when updating shapes with sliders. Trigger: interactive Plotly dashboard errors, FigureWidget batch_update issues
Matrix data model verification using ASCII diagrams. Use when working with *Progressions.ts files, defineProgression(), or testing how 2D numeric grids evolve over time. Auto-apply when editing files matching *Progressions.ts or src/test-utils/ascii*.ts.
Systematic exploratory data analysis following best practices. Use when analyzing any dataset to understand structure, identify data quality issues (duplicates, missing values, inconsistencies, outliers), examine distributions, detect correlations, and generate visualizations. Provides comprehensive data profiling with sanity checks before analysis.
Create AntV Infographic visualizations in Obsidian using fenced `infographic` code blocks. Use when the user wants to render process flows, timelines, hierarchies, charts, or comparison matrices within Obsidian markdown documents. Supports JSON and DSL syntax formats.
Create D3 questions with double number lines showing proportional relationships. Students complete missing values on parallel number lines.
This skill should be used when the user asks "Chart.js options", "Chart.js animations", "Chart.js legend", "Chart.js tooltip", "Chart.js title", "disable Chart.js animation", "customize Chart.js tooltip", "Chart.js responsive", "Chart.js aspect ratio", "Chart.js interactions", "Chart.js hover", "Chart.js click events", "Chart.js layout", "Chart.js padding", "Chart.js font", "Chart.js colors", "Chart.js external tooltip", "Chart.js custom legend", "Chart.js transitions", or needs help configuring Chart.js v4.5.1 options, plugins, and styling.