home/categories/media

category focus

Media

Audio, video, and image processing.

1476 skillsall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

media

2.7K

smarthome-video-anomaly-benchmark

VLM evaluation suite for video anomaly detection in smart home camera footage

SharpAI

content-media

open

media

2.5K

image-resize

Use this skill when the task involves resizing, scaling, or compressing image files. Suitable for tasks like "resize these photos to 800px wide", "compress images to reduce file size", or "batch scale all JPEGs in a folder". Only relevant for image processing tasks — do NOT use for data files, text, or non-image tasks.

agentscope-ai

content-media

open

media

2.5K

pr-demo

Use when creating animated demos (GIFs) for pull requests or documentation. Covers terminal recording with asciinema and conversion to GIF/SVG for GitHub embedding.

mikeyobrien

content-media

open

media

2.4K

nano-banana-pro

Generate/edit images with Nano Banana Pro (Gemini 3 Pro Image). Use for image create/modify requests incl. edits. Supports text-to-image + image-to-image; 1K/2K/4K; use --input-image.

steipete

content-media

open

media

2.4K

video-transcript-downloader

Download videos, audio, subtitles, and clean paragraph-style transcripts from YouTube and any other yt-dlp supported site. Use when asked to “download this video”, “save this clip”, “rip audio”, “get subtitles”, “get transcript”, or to troubleshoot yt-dlp/ffmpeg and formats/playlists.

steipete

content-media

open

media

2.3K

glmv-caption

Generate captions (descriptions) for images, videos, and documents using ZhiPu GLM-V multimodal model series. Use this skill whenever the user wants to describe, caption, summarize, or interpret the content of images, videos, or files. Supports single/multiple inputs, URLs, local paths, and base64 (images only).

zai-org

content-media

open

media

bio-vcf-manipulation

Merge, concatenate, sort, intersect, and subset VCF files using bcftools. Use when combining variant files, comparing call sets, or restructuring VCF data.

FreedomIntelligence

content-media

open

media

ai-multimodal

Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image, editing, composition, refinement). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images from text prompts, or implementing multimodal AI features. Supports multiple models (Gemini 2.5/2.0) with context windows up to 2M tokens.

mrgoonie

content-media

open

media

media-processing

Process multimedia files with FFmpeg (video/audio encoding, conversion, streaming, filtering, hardware acceleration) and ImageMagick (image manipulation, format conversion, batch processing, effects, composition). Use when converting media formats, encoding videos with specific codecs (H.264, H.265, VP9), resizing/cropping images, extracting audio from video, applying filters and effects, optimizing file sizes, creating streaming manifests (HLS/DASH), generating thumbnails, batch processing images, creating composite images, or implementing media processing pipelines. Supports 100+ formats, hardware acceleration (NVENC, QSV), and complex filtergraphs.

mrgoonie

content-media

open

media

1.9K

granola-performance-tuning

Optimize Granola transcription accuracy, note quality, and processing speed. Use when improving transcription quality, reducing processing time, optimizing templates for better AI output, or tuning audio setup. Trigger: "granola performance", "granola accuracy", "granola quality", "improve granola", "granola transcription better".

jeremylongshore

content-media

open

media

1.9K

klingai-image-to-video

Animate static images into video using Kling AI. Use when converting images to video, adding motion to stills, or building I2V pipelines. Trigger with phrases like 'klingai image to video', 'kling ai animate image', 'klingai img2vid', 'animate picture klingai'.

jeremylongshore

content-media

open

media

1.9K

deepgram-performance-tuning

Optimize Deepgram API performance for faster transcription and lower latency. Use when improving transcription speed, reducing latency, or optimizing audio processing pipelines. Trigger: "deepgram performance", "speed up deepgram", "optimize transcription", "deepgram latency", "deepgram faster", "deepgram throughput".

jeremylongshore

content-media

open

media

1.9K

processing-computer-vision-tasks

Process images using object detection, classification, and segmentation. Use when requesting "analyze image", "object detection", "image classification", or "computer vision". Trigger with relevant phrases based on skill purpose.

jeremylongshore

content-media

open

media

1.9K

twinmind-performance-tuning

Optimize TwinMind transcription accuracy and speed with Ear-3 model configuration, audio quality tuning, and caching strategies. Use when implementing performance tuning, or managing TwinMind meeting AI operations. Trigger with phrases like "twinmind performance tuning", "twinmind performance tuning".

jeremylongshore

content-media

open

media

1.9K

ltx-video

Generate videos via LTX-2.3 API (ltx.video). Supports text-to-video, image-to-video, audio-to-video (lip-sync from audio + image), extend, and retake. Use when: generating AI video from text/image/audio, animating a portrait, creating lip-sync video from an existing image + audio recording.

LeoYeAI

content-media

open

media

1.8K

speech-rough-cut-skill

【WORKFLOW SKILL】根据输入视频的音频信息进行口播粗剪。Rough cut based on audio information from the input video for narration.

FireRedTeam

content-media

open

media

1.8K

animation-performance-retro

Optimize 8-bit animations for smooth performance. Apply when creating animated pixel art, game UI effects, or any retro-styled animations.

TheOrcDev

content-media

open

media

1.8K

bilibili-video-download

Execute end-to-end Bilibili downloads with yutto. Use this whenever the user wants you to actually download a Bilibili 投稿视频、番剧、课程、收藏夹、稍后再看、合集、列表 or audio for them, or wants you to install/configure yutto and complete the download instead of merely explaining commands. This skill should verify installation and FFmpeg, check auth status, collect missing required inputs such as the link and download directory, then run the download.

yutto-dev

content-media

open

media

1.6K

compose-video

视频后期处理与合成。当用户说"加背景音乐"、"合并视频"、"加片头片尾"、想为成片添加 BGM、或需要将多集视频拼接时使用。

ArcReel

content-media

open

media

1.6K

generate-video

为剧本场景生成视频片段。当用户说"生成视频"、"把分镜图变成视频"、想重新生成某个场景的视频、或视频生成中断需要续传时使用。支持整集批量、单场景、断点续传等模式。

ArcReel

content-media

open

media

1.6K

openakita-skills-image-understander

Analyze images using GPT-4 Vision for detailed description, OCR text extraction, object recognition, and visual Q&A. Use when the user needs to understand image content, extract text from screenshots, identify objects in photos, or ask questions about images via OpenAI GPT-4 Vision API.

openakita

content-media

open

media

1.6K

get-image-file

Get local file path of image sent by user. When user sends image, system auto-downloads it. When you need to process user's image or analyze image content.

openakita

content-media

open

media

1.6K

get-voice-file

Get local file path of voice message sent by user. When user sends voice message, system auto-downloads it. When you need to process user's voice message or transcribe voice to text.

openakita

content-media

open

media

1.6K

openakita-skills-video-downloader

Download YouTube videos with customizable quality and format options. Use this skill when the user asks to download, save, or grab YouTube videos. Supports various quality settings (best, 1080p, 720p, 480p, 360p), multiple formats (mp4, webm, mkv), and audio-only downloads as MP3.

openakita

content-media

open

Page 8 / 62