home/categories/media

category focus

Media

Audio, video, and image processing.

1476 اسکلزall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

media

image-enhancer

Improves the quality of images, especially screenshots, by enhancing resolution, sharpness, and clarity. Perfect for preparing images for presentations, documentation, or social media posts.

christophacham

content-media

open

media

Universal dataset import for FiftyOne supporting all media types (images, videos, point clouds, 3D scenes), all label formats (COCO, YOLO, VOC, CVAT, KITTI, etc.), and multimodal grouped datasets. Use when users want to import any dataset regardless of format, automatically detect folder structure, handle autonomous driving data with multiple cameras and LiDAR, or create grouped datasets from multimodal data. Requires FiftyOne MCP server.

ComeOnOliver

content-media

open

media

deepgram-performance-tuning

Optimize Deepgram API performance for faster transcription and lower latency. Use when improving transcription speed, reducing latency, or optimizing audio processing pipelines. Trigger: "deepgram performance", "speed up deepgram", "optimize transcription", "deepgram latency", "deepgram faster", "deepgram throughput".

ComeOnOliver

content-media

open

media

video-processor

Process video files with audio extraction, format conversion (mp4, webm), and Whisper transcription. Use when user mentions video conversion, audio extraction, transcription, mp4, webm, ffmpeg, or whisper transcription.

ComeOnOliver

content-media

open

media

axiom-ios-graphics

Use when working with ANY GPU rendering, Metal, OpenGL migration, shaders, 3D content, RealityKit, AR, or display performance. Covers Metal migration, shader conversion, RealityKit ECS, RealityView, variable refresh rate, ProMotion.

ComeOnOliver

content-media

open

media

video-comparer

This skill should be used when comparing two videos to analyze compression results or quality differences. Generates interactive HTML reports with quality metrics (PSNR, SSIM) and frame-by-frame visual comparisons. Triggers when users mention "compare videos", "video quality", "compression analysis", "before/after compression", or request quality assessment of compressed videos.

ComeOnOliver

content-media

open

media

ponyflash

Generate images, videos, speech audio, and music using the PonyFlash Python SDK. Also handle local media editing with FFmpeg, including clip, concat, transcode, extract audio, frame capture, subtitle capability checks, and ASS subtitle prep. Use when the user asks to create, generate, produce, edit, trim, merge, concatenate, transcode, subtitle, or render AI-generated media content.

ComeOnOliver

content-media

open

media

java-add-graalvm-native-image-support

GraalVM Native Image expert that adds native image support to Java applications, builds the project, analyzes build errors, applies fixes, and iterates until successful compilation using Oracle best practices.

ComeOnOliver

content-media

open

media

axiom-camera-capture-diag

camera freezes, preview rotated wrong, capture slow, session interrupted, black preview, front camera mirrored, camera not starting, AVCaptureSession errors, startRunning blocks, phone call interrupts camera

ComeOnOliver

content-media

open

media

youtube-downloader

Download YouTube videos with customizable quality and format options. Use this skill when the user asks to download, save, or grab YouTube videos. Supports various quality settings (best, 1080p, 720p, 480p, 360p), multiple formats (mp4, webm, mkv), and audio-only downloads as MP3.

ComeOnOliver

content-media

open

media

youtube-transcript

Extract transcripts from YouTube videos. Use when the user asks for a transcript, subtitles, or captions of a YouTube video and provides a YouTube URL (youtube.com/watch?v=, youtu.be/, or similar). Supports output with or without timestamps.

ComeOnOliver

content-media

open

media

nano-banana-pro

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

ComeOnOliver

content-media

open

media

nano-banana-2

Generate and edit images using Google's Nano Banana 2 (Gemini 3.1 Flash Image Preview) API. This skill should be used when the user asks to create or modify images, especially when they need fast iteration, explicit aspect-ratio control, or resolution control from 512px to 4K.

ComeOnOliver

content-media

open

media

tts-script-generator

Intelligently compress and rewrite documents into TTS-friendly scripts. Uses Claude AI to analyze content, compress to target duration, convert to spoken style with emotional language, and auto-segment. Perfect for video narration.

ComeOnOliver

content-media

open

media

transloadit-media-processing

Process media files (video, audio, images, documents) using Transloadit. Use when asked to encode video to HLS/MP4, generate thumbnails, resize or watermark images, extract audio, concatenate clips, add subtitles, OCR documents, or run any media processing pipeline. Covers 86+ processing robots for file transformation at scale.

ComeOnOliver

content-media

open

media

image-manipulation-image-magick

Process and manipulate images using ImageMagick. Supports resizing, format conversion, batch processing, and retrieving image metadata. Use when working with images, creating thumbnails, resizing wallpapers, or performing batch image operations.

ComeOnOliver

content-media

open

media

axiom-camera-capture

AVCaptureSession, camera preview, photo capture, video recording, RotationCoordinator, session interruptions, deferred processing, capture responsiveness, zero-shutter-lag, photoQualityPrioritization, front camera mirroring

ComeOnOliver

content-media

open

media

translate-video

Translate video subtitles to any language with native-quality refinement. Full pipeline: transcribe → translate → refine → embed RTL-safe subtitles. Use for: translate video, תרגם סרטון, video translation, foreign subtitles, Hebrew subtitles, translated captions.

aviz85

content-media

open

media

axiom-camera-capture-ref

Reference — AVCaptureSession, AVCapturePhotoSettings, AVCapturePhotoOutput, RotationCoordinator, photoQualityPrioritization, deferred processing, AVCaptureMovieFileOutput, session presets, capture device APIs

ComeOnOliver

content-media

open

media

translate-video

aviz85

content-media

open

media

remotion-best-practices

Create and edit Remotion videos with domain-specific knowledge. TRIGGERS - Use this skill when: - User mentions "Remotion" in their request - User references a Remotion project path (remotion-videos/*, contains remotion.config.ts) - User asks to create, animate, or edit video content in a Remotion context - User wants to build compositions, animations, or video sequences in React Always invoke this skill BEFORE writing any Remotion code.

mwguerra

content-media

open

media

youtube-downloader

Download YouTube videos with quality presets. Use for: download youtube, yt download, video download, youtube to whatsapp, youtube mp3.

aviz85

content-media

open

media

veo3-prompter

Craft professional video prompts for Google Veo 3.1 using cinematic techniques, audio direction, and timestamp choreography. Use when generating AI videos, creating video prompts, or working with Veo 3.

leegonzales

content-media

open

media

slide-builder

Transform essay-to-speech output into complete presentations with multiple output formats. Use when converting talk tracks to slides, generating presentation decks, or creating video-ready content from spoken word material.

leegonzales

content-media

open

Page 39 / 62