home/categories/media

category focus

Media

Audio, video, and image processing.

1476 skillsall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

media

audio-engineering-principles

Use for real-time audio code safety, determinism, and numeric hygiene. Required foundation for DSP, audio analysis, audio systems, and JUCE work. Not for game-audio middleware or ffmpeg/video tasks.

erikstmartin

content-media

open

media

ffmpeg

Use when programmatically processing video/audio with libffmpeg C API. Not for command-line ffmpeg operations.

erikstmartin

content-media

open

media

gemini-audio-transcriber

Gemini 2.0 Flash APIを使用して音声ファイル（m4a, mp3, wav等）を日本語で文字起こしするスキル。講義録音や音声メモの書き起こしに使用。

kazuph

content-media

open

media

godot-audio

Use for Godot game-audio and middleware integration — AudioBus routing, Wwise/FMOD events, adaptive music, and procedural sound systems. Not for DSP, JUCE, ffmpeg, or offline audio pipelines.

erikstmartin

content-media

open

media

recreate-thumbnails

Face-swap YouTube thumbnails to feature Nick Saraev using AI. Use when user asks to recreate thumbnails, face swap images, generate YouTube thumbnails, or create thumbnail variations.

nickjwells

content-media

open

media

Full video production workflow for Remotion projects. Teaches how to orchestrate MCP tools (TTS, music, SFX, stock footage, video analysis) into complete Remotion compositions. Use this skill whenever producing a video that needs audio, voiceovers, music, stock footage, or analyzing existing video files.

DojoCodingLabs

content-media

open

media

pan-3d-transition

Create 3D pan/swivel transition effects for videos using Remotion. Use when user asks to add 3D transitions, create swivel effects, or add video transitions.

nickjwells

content-media

open

media

ai-multimodal

Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image, editing, composition, refinement). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images from text prompts, or implementing multimodal AI features. Supports multiple models (Gemini 2.5/2.0) with context windows up to 2M tokens.

jackspace

content-media

open

media

youtube-downloader

Download YouTube videos with customizable quality and format options. Use this skill when the user asks to download, save, or grab YouTube videos. Supports various quality settings (best, 1080p, 720p, 480p, 360p), multiple formats (mp4, webm, mkv), and audio-only downloads as MP3.

kofttlcc

content-media

open

media

image-enhancer

Improves the quality of images, especially screenshots, by enhancing resolution, sharpness, and clarity. Perfect for preparing images for 簡報s, 文檔ation, or social media posts.

kofttlcc

content-media

open

media

youtube-downloader

Download videos, audio, playlists, and channels from YouTube and 1000+ websites using yt-dlp. Supports quality selection, format conversion, subtitle download, playlist filtering, metadata extraction, thumbnail download, and batch operations. Use when downloading YouTube videos in any quality (4K, 8K, HDR), extracting audio as MP3/M4A/FLAC, downloading entire playlists/channels, getting subtitles in multiple languages, converting to specific formats, downloading live streams, archiving content, or batch processing multiple URLs. Optimized for reliability with automatic retries, rate limiting, and error handling.

jackspace

content-media

open

media

image-fetcher

Fetch and download images from the internet in various formats (JPG, PNG, GIF, WebP, BMP, SVG, etc.). Use when users ask to download images, fetch images from URLs, save images from the web, or get images for embedding in documents or chats. Supports single and batch downloads with automatic format detection.

Interstellar-code

content-media

open

media

media-processing

Process multimedia files with FFmpeg (video/audio encoding, conversion, streaming, filtering, hardware acceleration) and ImageMagick (image manipulation, format conversion, batch processing, effects, composition). Use when converting media formats, encoding videos with specific codecs (H.264, H.265, VP9), resizing/cropping images, extracting audio from video, applying filters and effects, optimizing file sizes, creating streaming manifests (HLS/DASH), generating thumbnails, batch processing images, creating composite images, or implementing media processing pipelines. Supports 100+ formats, hardware acceleration (NVENC, QSV), and complex filtergraphs.

jackspace

content-media

open

media

process-raster

Process raster data: clip by bounding box, stack multiple bands, mosaic GeoTIFFs, or convert between raster and vector formats.

opengeos

content-media

open

media

g-skl-ingest-youtube

YouTube transcript ingestion into the vault. Uses yt-dlp to fetch transcripts locally — no Docker, no MCP, no screen captures. Stores in research/videos/ with analysis_depth=transcript_only for future vision upgrade.

wrm3

content-media

open

media

openai-whisper-api

Transcribe audio via OpenAI Audio Transcriptions API (Whisper); Don't use if you want local/offline transcription; prefer openai-whisper.

unisone

content-media

open

media

histolab

Digital pathology image processing toolkit for whole slide images (WSI). Use this skill when working with histopathology slides, processing H&E or IHC stained tissue images, extracting tiles from gigapixel pathology images, detecting tissue regions, segmenting tissue masks, or preparing datasets for computational pathology deep learning pipelines. Applies to WSI formats (SVS, TIFF, NDPI), tile-based analysis, and histological image preprocessing workflows.

jackspace

content-media

open

media

youtube-video-analysis

MCP or full-pipeline video analysis — vault notes must match Obsidian standard. For local yt-dlp transcripts only, use g-skl-ingest-youtube.

wrm3

content-media

open

media

mulerouter

Generates images, videos, audio, speech, and music using MuleRouter or MuleRun multimodal APIs. Text-to-Image, Image-to-Image, Text-to-Video, Image-to-Video, Reference-to-Video, Video-to-Video, video editing (VACE, keyframe interpolation), Text-to-Speech, Text-to-Music. Use when the user wants to generate, edit, or transform images, videos, speech, or music using AI models like Wan2.6, Veo3, Nano Banana Pro, Sora2, Midjourney, Kling V3, Kling V3 Omni, MiniMax Speech 2.8, MiniMax Music 2.5.

openmule

content-media

open

media

creating-video-websites

Turn a video into a premium scroll-driven animated website with GSAP, canvas frame rendering, and layered animation choreography. Use when the user wants to convert a video into an animated web experience.

WilkoMarketing

content-media

open

media

motion-designer

Advanced motion designer with decades of After Effects and motion graphics experience, specialized in creating engaging video specifications for Remotion. Use when creating video specs, planning motion graphics, designing animations, or when asked to "create a video", "design motion graphics", "plan video content", or "spec out a video". Produces detailed scene-by-scene specifications with timing, audio, sound effects, and animation descriptions.

ncklrs

content-media

open

media

create-video-start

Master orchestrator that chains all Remotion video creation skills together in a single automated pipeline. Takes a creative brief and produces a complete, production-ready Remotion video project. Use when starting a new video from scratch, when asked to "create a video", "make a video", "build a complete video", or "video from idea to code".

ncklrs

content-media

open

media

face

Remember someone's face, save it permanently with their consent, or forget someone. You can also add more angles of a known person.

OriNachum

content-media

open

media

g-skl-ingest-youtube

wrm3

content-media

open

Page 48 / 62