category focus

Media

Audio, video, and image processing.

1476 skillsall categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
media
15

audio-engineering-principles

Use for real-time audio code safety, determinism, and numeric hygiene. Required foundation for DSP, audio analysis, audio systems, and JUCE work. Not for game-audio middleware or ffmpeg/video tasks.

erikstmartin
erikstmartin
content-media
open
media
15

ffmpeg

Use when programmatically processing video/audio with libffmpeg C API. Not for command-line ffmpeg operations.

erikstmartin
erikstmartin
content-media
open
media
15

gemini-audio-transcriber

Gemini 2.0 Flash APIを使用して音声ファイル(m4a, mp3, wav等)を日本語で文字起こしするスキル。講義録音や音声メモの書き起こしに使用。

kazuph
kazuph
content-media
open
media
15

godot-audio

Use for Godot game-audio and middleware integration — AudioBus routing, Wwise/FMOD events, adaptive music, and procedural sound systems. Not for DSP, JUCE, ffmpeg, or offline audio pipelines.

erikstmartin
erikstmartin
content-media
open
media
15

recreate-thumbnails

Face-swap YouTube thumbnails to feature Nick Saraev using AI. Use when user asks to recreate thumbnails, face swap images, generate YouTube thumbnails, or create thumbnail variations.

nickjwells
nickjwells
content-media
open
media
15

remotion-production

Full video production workflow for Remotion projects. Teaches how to orchestrate MCP tools (TTS, music, SFX, stock footage, video analysis) into complete Remotion compositions. Use this skill whenever producing a video that needs audio, voiceovers, music, stock footage, or analyzing existing video files.

DojoCodingLabs
DojoCodingLabs
content-media
open
media
15

pan-3d-transition

Create 3D pan/swivel transition effects for videos using Remotion. Use when user asks to add 3D transitions, create swivel effects, or add video transitions.

nickjwells
nickjwells
content-media
open
media
14

ai-multimodal

Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image, editing, composition, refinement). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images from text prompts, or implementing multimodal AI features. Supports multiple models (Gemini 2.5/2.0) with context windows up to 2M tokens.

jackspace
jackspace
content-media
open
media
14

youtube-downloader

Download YouTube videos with customizable quality and format options. Use this skill when the user asks to download, save, or grab YouTube videos. Supports various quality settings (best, 1080p, 720p, 480p, 360p), multiple formats (mp4, webm, mkv), and audio-only downloads as MP3.

kofttlcc
kofttlcc
content-media
open
media
14

image-enhancer

Improves the quality of images, especially screenshots, by enhancing resolution, sharpness, and clarity. Perfect for preparing images for 簡報s, 文檔ation, or social media posts.

kofttlcc
kofttlcc
content-media
open
media
14

youtube-downloader

Download videos, audio, playlists, and channels from YouTube and 1000+ websites using yt-dlp. Supports quality selection, format conversion, subtitle download, playlist filtering, metadata extraction, thumbnail download, and batch operations. Use when downloading YouTube videos in any quality (4K, 8K, HDR), extracting audio as MP3/M4A/FLAC, downloading entire playlists/channels, getting subtitles in multiple languages, converting to specific formats, downloading live streams, archiving content, or batch processing multiple URLs. Optimized for reliability with automatic retries, rate limiting, and error handling.

jackspace
jackspace
content-media
open
media
14

image-fetcher

Fetch and download images from the internet in various formats (JPG, PNG, GIF, WebP, BMP, SVG, etc.). Use when users ask to download images, fetch images from URLs, save images from the web, or get images for embedding in documents or chats. Supports single and batch downloads with automatic format detection.

Interstellar-code
Interstellar-code
content-media
open
media
14

media-processing

Process multimedia files with FFmpeg (video/audio encoding, conversion, streaming, filtering, hardware acceleration) and ImageMagick (image manipulation, format conversion, batch processing, effects, composition). Use when converting media formats, encoding videos with specific codecs (H.264, H.265, VP9), resizing/cropping images, extracting audio from video, applying filters and effects, optimizing file sizes, creating streaming manifests (HLS/DASH), generating thumbnails, batch processing images, creating composite images, or implementing media processing pipelines. Supports 100+ formats, hardware acceleration (NVENC, QSV), and complex filtergraphs.

jackspace
jackspace
content-media
open
media
14

process-raster

Process raster data: clip by bounding box, stack multiple bands, mosaic GeoTIFFs, or convert between raster and vector formats.

opengeos
opengeos
content-media
open
media
14

g-skl-ingest-youtube

YouTube transcript ingestion into the vault. Uses yt-dlp to fetch transcripts locally — no Docker, no MCP, no screen captures. Stores in research/videos/ with analysis_depth=transcript_only for future vision upgrade.

wrm3
wrm3
content-media
open
media
14

openai-whisper-api

Transcribe audio via OpenAI Audio Transcriptions API (Whisper); Don't use if you want local/offline transcription; prefer openai-whisper.

unisone
unisone
content-media
open
media
14

histolab

Digital pathology image processing toolkit for whole slide images (WSI). Use this skill when working with histopathology slides, processing H&E or IHC stained tissue images, extracting tiles from gigapixel pathology images, detecting tissue regions, segmenting tissue masks, or preparing datasets for computational pathology deep learning pipelines. Applies to WSI formats (SVS, TIFF, NDPI), tile-based analysis, and histological image preprocessing workflows.

jackspace
jackspace
content-media
open
media
14

youtube-video-analysis

MCP or full-pipeline video analysis — vault notes must match Obsidian standard. For local yt-dlp transcripts only, use g-skl-ingest-youtube.

wrm3
wrm3
content-media
open
media
14

mulerouter

Generates images, videos, audio, speech, and music using MuleRouter or MuleRun multimodal APIs. Text-to-Image, Image-to-Image, Text-to-Video, Image-to-Video, Reference-to-Video, Video-to-Video, video editing (VACE, keyframe interpolation), Text-to-Speech, Text-to-Music. Use when the user wants to generate, edit, or transform images, videos, speech, or music using AI models like Wan2.6, Veo3, Nano Banana Pro, Sora2, Midjourney, Kling V3, Kling V3 Omni, MiniMax Speech 2.8, MiniMax Music 2.5.

openmule
openmule
content-media
open
media
14

creating-video-websites

Turn a video into a premium scroll-driven animated website with GSAP, canvas frame rendering, and layered animation choreography. Use when the user wants to convert a video into an animated web experience.

WilkoMarketing
WilkoMarketing
content-media
open
media
14

motion-designer

Advanced motion designer with decades of After Effects and motion graphics experience, specialized in creating engaging video specifications for Remotion. Use when creating video specs, planning motion graphics, designing animations, or when asked to "create a video", "design motion graphics", "plan video content", or "spec out a video". Produces detailed scene-by-scene specifications with timing, audio, sound effects, and animation descriptions.

ncklrs
ncklrs
content-media
open
media
14

create-video-start

Master orchestrator that chains all Remotion video creation skills together in a single automated pipeline. Takes a creative brief and produces a complete, production-ready Remotion video project. Use when starting a new video from scratch, when asked to "create a video", "make a video", "build a complete video", or "video from idea to code".

ncklrs
ncklrs
content-media
open
media
14

face

Remember someone's face, save it permanently with their consent, or forget someone. You can also add more angles of a known person.

OriNachum
OriNachum
content-media
open
media
14

g-skl-ingest-youtube

YouTube transcript ingestion into the vault. Uses yt-dlp to fetch transcripts locally — no Docker, no MCP, no screen captures. Stores in research/videos/ with analysis_depth=transcript_only for future vision upgrade.

wrm3
wrm3
content-media
open
Previous
Page 48 / 62
Next