category focus

Media

Audio, video, and image processing.

1476 skillsall categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
media
2.5K

image-resize

Use this skill when the task involves resizing, scaling, or compressing image files. Suitable for tasks like "resize these photos to 800px wide", "compress images to reduce file size", or "batch scale all JPEGs in a folder". Only relevant for image processing tasks — do NOT use for data files, text, or non-image tasks.

agentscope-ai
agentscope-ai
content-media
open
media
2.5K

pr-demo

Use when creating animated demos (GIFs) for pull requests or documentation. Covers terminal recording with asciinema and conversion to GIF/SVG for GitHub embedding.

mikeyobrien
mikeyobrien
content-media
open
media
2.4K

nano-banana-pro

Generate/edit images with Nano Banana Pro (Gemini 3 Pro Image). Use for image create/modify requests incl. edits. Supports text-to-image + image-to-image; 1K/2K/4K; use --input-image.

steipete
steipete
content-media
open
media
2.4K

video-transcript-downloader

Download videos, audio, subtitles, and clean paragraph-style transcripts from YouTube and any other yt-dlp supported site. Use when asked to “download this video”, “save this clip”, “rip audio”, “get subtitles”, “get transcript”, or to troubleshoot yt-dlp/ffmpeg and formats/playlists.

steipete
steipete
content-media
open
media
2.3K

glmv-caption

Generate captions (descriptions) for images, videos, and documents using ZhiPu GLM-V multimodal model series. Use this skill whenever the user wants to describe, caption, summarize, or interpret the content of images, videos, or files. Supports single/multiple inputs, URLs, local paths, and base64 (images only).

zai-org
zai-org
content-media
open
media
2K

bio-vcf-manipulation

Merge, concatenate, sort, intersect, and subset VCF files using bcftools. Use when combining variant files, comparing call sets, or restructuring VCF data.

FreedomIntelligence
FreedomIntelligence
content-media
open
media
2K

ai-multimodal

Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image, editing, composition, refinement). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images from text prompts, or implementing multimodal AI features. Supports multiple models (Gemini 2.5/2.0) with context windows up to 2M tokens.

mrgoonie
mrgoonie
content-media
open
media
2K

media-processing

Process multimedia files with FFmpeg (video/audio encoding, conversion, streaming, filtering, hardware acceleration) and ImageMagick (image manipulation, format conversion, batch processing, effects, composition). Use when converting media formats, encoding videos with specific codecs (H.264, H.265, VP9), resizing/cropping images, extracting audio from video, applying filters and effects, optimizing file sizes, creating streaming manifests (HLS/DASH), generating thumbnails, batch processing images, creating composite images, or implementing media processing pipelines. Supports 100+ formats, hardware acceleration (NVENC, QSV), and complex filtergraphs.

mrgoonie
mrgoonie
content-media
open
media
1.9K

granola-performance-tuning

Optimize Granola transcription accuracy, note quality, and processing speed. Use when improving transcription quality, reducing processing time, optimizing templates for better AI output, or tuning audio setup. Trigger: "granola performance", "granola accuracy", "granola quality", "improve granola", "granola transcription better".

jeremylongshore
jeremylongshore
content-media
open
media
1.9K

klingai-image-to-video

Animate static images into video using Kling AI. Use when converting images to video, adding motion to stills, or building I2V pipelines. Trigger with phrases like 'klingai image to video', 'kling ai animate image', 'klingai img2vid', 'animate picture klingai'.

jeremylongshore
jeremylongshore
content-media
open
media
1.9K

deepgram-performance-tuning

Optimize Deepgram API performance for faster transcription and lower latency. Use when improving transcription speed, reducing latency, or optimizing audio processing pipelines. Trigger: "deepgram performance", "speed up deepgram", "optimize transcription", "deepgram latency", "deepgram faster", "deepgram throughput".

jeremylongshore
jeremylongshore
content-media
open
media
1.9K

processing-computer-vision-tasks

Process images using object detection, classification, and segmentation. Use when requesting "analyze image", "object detection", "image classification", or "computer vision". Trigger with relevant phrases based on skill purpose.

jeremylongshore
jeremylongshore
content-media
open
media
1.9K

twinmind-performance-tuning

Optimize TwinMind transcription accuracy and speed with Ear-3 model configuration, audio quality tuning, and caching strategies. Use when implementing performance tuning, or managing TwinMind meeting AI operations. Trigger with phrases like "twinmind performance tuning", "twinmind performance tuning".

jeremylongshore
jeremylongshore
content-media
open
media
1.9K

ltx-video

Generate videos via LTX-2.3 API (ltx.video). Supports text-to-video, image-to-video, audio-to-video (lip-sync from audio + image), extend, and retake. Use when: generating AI video from text/image/audio, animating a portrait, creating lip-sync video from an existing image + audio recording.

LeoYeAI
LeoYeAI
content-media
open
media
1.8K

speech-rough-cut-skill

【WORKFLOW SKILL】根据输入视频的音频信息进行口播粗剪。Rough cut based on audio information from the input video for narration.

FireRedTeam
FireRedTeam
content-media
open
media
1.8K

animation-performance-retro

Optimize 8-bit animations for smooth performance. Apply when creating animated pixel art, game UI effects, or any retro-styled animations.

TheOrcDev
TheOrcDev
content-media
open
media
1.8K

bilibili-video-download

Execute end-to-end Bilibili downloads with yutto. Use this whenever the user wants you to actually download a Bilibili 投稿视频、番剧、课程、收藏夹、稍后再看、合集、列表 or audio for them, or wants you to install/configure yutto and complete the download instead of merely explaining commands. This skill should verify installation and FFmpeg, check auth status, collect missing required inputs such as the link and download directory, then run the download.

yutto-dev
yutto-dev
content-media
open
media
1.6K

compose-video

视频后期处理与合成。当用户说"加背景音乐"、"合并视频"、"加片头片尾"、想为成片添加 BGM、或需要将多集视频拼接时使用。

ArcReel
ArcReel
content-media
open
media
1.6K

generate-video

为剧本场景生成视频片段。当用户说"生成视频"、"把分镜图变成视频"、想重新生成某个场景的视频、或视频生成中断需要续传时使用。支持整集批量、单场景、断点续传等模式。

ArcReel
ArcReel
content-media
open
media
1.6K

openakita-skills-image-understander

Analyze images using GPT-4 Vision for detailed description, OCR text extraction, object recognition, and visual Q&A. Use when the user needs to understand image content, extract text from screenshots, identify objects in photos, or ask questions about images via OpenAI GPT-4 Vision API.

openakita
openakita
content-media
open
media
1.6K

get-image-file

Get local file path of image sent by user. When user sends image, system auto-downloads it. When you need to process user's image or analyze image content.

openakita
openakita
content-media
open
media
1.6K

get-voice-file

Get local file path of voice message sent by user. When user sends voice message, system auto-downloads it. When you need to process user's voice message or transcribe voice to text.

openakita
openakita
content-media
open
media
1.6K

openakita-skills-video-downloader

Download YouTube videos with customizable quality and format options. Use this skill when the user asks to download, save, or grab YouTube videos. Supports various quality settings (best, 1080p, 720p, 480p, 360p), multiple formats (mp4, webm, mkv), and audio-only downloads as MP3.

openakita
openakita
content-media
open
Previous
Page 8 / 62
Next