large-file-toc
Generate table of contents overview for large files. When onboarded Markdown file exceeds threshold (default 30KB), extract heading structure to create navigation file. Trigger condition: Markdown file size >= 30KB.
CMS, document processing, and media generation.
Generate table of contents overview for large files. When onboarded Markdown file exceeds threshold (default 30KB), extract heading structure to create navigation file. Trigger condition: Markdown file size >= 30KB.
AI image generation, editing, and background removal API via Bria.ai — remove backgrounds to get transparent PNGs and cutouts, generate images from text prompts, and edit photos with natural language instructions. Also create product photography and lifestyle shots, replace or blur backgrounds, upscale resolution, restyle, and batch-generate visual assets. Use this skill whenever the user wants to remove a background, create transparent PNGs, generate, edit, modify, or transform any image — including hero images, banners, social media visuals, product photos, illustrations, icons, thumbnails, ad creatives, or marketing materials. Also triggers on cutout, inpainting, outpainting, object removal or addition, photo restoration, style transfer, image enhancement, relight, reseason, sketch-to-photo, or any visual content creation. Commercially safe, royalty-free. 20+ specialized endpoints for e-commerce, web design, and content pipelines.
Create and edit images locally with no AI (programmatic operations). Use when the user wants to create a new image (blank, gradient, solid color), resize an image, draw rectangles or shapes on an image, add a watermark, paste a logo, overlay one image on another, or do any Pillow/ImageMagick-style image operations. Do not use for text-to-image generation — use generate_image (AI) instead.
Download YouTube video transcripts with automatic frame extraction for visual references. Use when analyzing YouTube videos, tutorials, or conference talks.
Download, transcribe, and summarize videos via the Inngest pipeline. Use when the user asks to grab/download/transcribe/ingest a video, save a YouTube video, or process any video URL. Also handles batch ingest of multiple URLs. This skill triggers the durable Inngest workflow — do NOT run yt-dlp, mlx-whisper, or scp manually.
Upload, manage, and embed videos via Mux. Covers direct uploads, API asset management, webhook event flow, playback embedding, and the Mux CLI. Use when uploading video, creating assets, checking encoding status, embedding playback, or handling Mux webhook events.
Classic image manipulation with Python Pillow - resize, crop, composite, format conversion, watermarks, brightness/contrast adjustments, and web optimization. Use this skill when post-processing AI-generated images, preparing images for web delivery, batch processing image directories, creating responsive image variants, or performing any deterministic pixel-level image operation. Works standalone or alongside bria-ai for post-processing generated images.
Authoritative reference for Mermaid diagram syntax. Provides diagram types, syntax patterns, examples, and platform integration guidance for generating accurate Mermaid diagrams.
Authoritative reference for PlantUML diagram syntax. Provides UML and non-UML diagram types, syntax patterns, examples, and setup guidance for generating accurate PlantUML diagrams.
Melodic Software brand identity guidelines. Use when styling projects, creating marketing materials, building UI components, or ensuring brand consistency. Covers colors (#1E90FF melodic blue primary), typography (Inter font family), logo usage, brand voice ("Building software that sings"), and component patterns.
Use when implementing responsive images, format conversion, focal point cropping, or image processing pipelines. Covers srcset generation, WebP/AVIF conversion, lazy loading, and image transformation APIs for headless CMS.
Generate and edit images with vLLM-Omni using models like FLUX, Stable Diffusion 3, Qwen-Image, GLM-Image, BAGEL, and Z-Image. Use when generating images from text, editing images, configuring diffusion parameters, or working with image generation models.
Generate game assets using AI image generation APIs (DALL-E, Replicate, fal.ai) and prepare them for Godot. Covers the full art pipeline from concept art and style guides to final sprites, sprite sheets, and import configuration. This skill should be used when creating game art, generating sprites, making tilesets, creating UI elements, or preparing assets for Godot import. Keywords: game assets, AI art, DALL-E, Replicate, fal.ai, sprite sheet, tileset, Godot, pixel art, character sprite, game art, texture, animation frames.
Generate videos with vLLM-Omni using Wan2.2 and other video generation models. Use when generating videos from text, creating videos from images, configuring video generation parameters, or working with text-to-video or image-to-video models.
Add a new diffusion model (text-to-image, text-to-video, image-to-video, text-to-audio, image editing) to vLLM-Omni, including Cache-DiT acceleration and parallelism support (TP, SP/USP, CFG-Parallel, HSDP). Use when integrating a new diffusion model, porting a diffusers pipeline or a custom model repo to vllm-omni, creating a new DiT transformer adapter, adding diffusion model support, or enabling multi-GPU parallelism and cache acceleration for an existing model.
Generate audio and speech with vLLM-Omni using Qwen3-TTS, Fish Speech S2 Pro, CosyVoice3, MiMo-Audio, and Stable-Audio models. Use when synthesizing speech from text, generating audio effects or music, configuring TTS parameters, cloning voices, adding new TTS models, or working with text-to-speech models.
Transcribe speech, generate images from prompts, analyze video content, and convert between modalities using multimodal omni-modality models like Qwen2.5-Omni and Qwen3-Omni. Use when working with multimodal models for speech recognition, image generation, video understanding, voice synthesis, or any task combining text, image, audio, and video inputs and outputs simultaneously.
Use when adding a recipe for omnimodal models (text-to-image, text-to-video, text-to-audio, image-to-video, any-to-any, diffusion transformers) to the vLLM recipes repository, or documenting vLLM-Omni deployment
Convert written documents to narrated video scripts with TTS audio and word-level timing. Use when preparing essays, blog posts, or articles for video narration. Outputs scene files, audio, and VTT with precise word timestamps. Keywords: narration, voiceover, TTS, scenes, audio, timing, video script, spoken.
Generates image generation prompts for Xiaohongshu covers based on user content. It polishes the content to fit Xiaohongshu style + applies a visual style template to produce a JSON output for image generation.