home/categories/media

category focus

Media

Audio, video, and image processing.

1476 skillsall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

media

1.4K

qqbot-media

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。

tencent-connect

content-media

open

media

1.4K

youtube-transcribe-skill

Extract subtitles/transcripts from YouTube videos. Triggers: "youtube transcript", "extract subtitles", "video captions", "视频字幕", "字幕提取", "YouTube转文字", "提取字幕".

feiskyer

content-media

open

media

1.4K

videocut

口播视频转录和口误识别。生成审查稿和删除任务清单。触发词：剪口播、处理视频、识别口误

Ceeon

content-media

open

media

1.4K

videocut

视频高清导出。2-pass编码+锐化，匹配或超越原片画质。触发词：高清化、高清导出、导出高清、渲染高清

Ceeon

content-media

open

media

1.4K

Process video files with audio extraction, format conversion (mp4, webm), and Whisper transcription. Use when user mentions video conversion, audio extraction, transcription, mp4, webm, ffmpeg, or whisper transcription.

disler

content-media

open

media

1.4K

panning-for-gold

Use when processing voice transcripts, brain dumps, stream-of-consciousness notes, or any raw multi-topic capture. Extracts every idea thread, then evaluates each one with deep brainstorming, then captures results to Open Brain. Trigger on transcripts, exports, "process this", "pan for gold", "brain dump", "what did I say", or multi-topic markdown files.

NateBJones-Projects

content-media

open

media

1.3K

video-studio

Use when the user asks to create or edit videos end-to-end (script→video, auto-cut/jumpcut, captions/subtitles, polishing for Shorts/Reels/TikTok). Current implemented backend: local FFmpeg (probe/render/jumpcut/burn-subtitles/polish). Planned/optional backends: Remotion (motion graphics templates), VectCutAPI (CapCut/剪映 timeline editing), and video-audio-mcp (MCP tool wrapper) when available. Produces a finished video artifact (MP4 by default) from assets + copy + a design/storyboard plan.

foryourhealth111-pixel

content-media

open

media

1.3K

bg-remove

Remove backgrounds from images using local AI (rembg). Use when removing backgrounds from character art, mascot images, photos, or any image that needs a transparent background.

peterkrueck

content-media

open

media

1.2K

shogun-screenshot

スクリーンショットの取得・加工を行う。ローカルスクショから最新画像を取得、 PlaywrightでWebページをキャプチャ、画像のトリミング・リサイズ、機微情報を黒塗りマスキング。記事執筆、レポート作成、UI確認、画像加工時に起動。「スクショ」「スクリーンショット」「画面キャプチャ」「最新のスクショ」「画像加工」「トリミング」「マスク」「写メ」「写メ撮った」「スクショ撮った」で起動。 Do NOT use for: 画像生成（shogun-imagegenを使え）。

yohey-w

content-media

open

media

1.2K

media-comprehension

"An intelligent assistant specialized in handling media files (images/audio/video). **Only for media file analysis**, does not handle document types.\n\n✅ Media files that can be processed:\n- Images: .jpg, .jpeg, .png, .gif, .bmp, .webp, .svg\n- Audio: .mp3, .wav, .m4a, .flac, .aac, .ogg\n- Video: .mp4, .avi, .mov, .mkv, .webm, .flv\n\n❌ Files that cannot be processed (please do not trigger this skill):\n- Documents: .pdf, .doc, .docx, .txt, .md, .rtf\n- Spreadsheets: .xlsx, .xls, .csv, .tsv\n- Presentations: .pptx, .ppt, .key\n- Code: .py, .js, .ts, .java, .cpp, .go, .rs\n- Archives: .zip, .tar, .gz, .rar, .7z\n- Executables: .exe, .bin, .app, .dmg\n- Databases: .db, .sqlite, .sql\n- Configuration files: .json, .xml, .yaml, .yml, .toml, .ini\n- Web pages: .html, .htm, .css\n\n**Trigger conditions**: When the user explicitly requests to analyze image/audio/video content, or when the file extension belongs to the aforementioned media types.". "

inclusionAI

content-media

open

media

1.2K

video-subtitles-and-audio-insert-workflow

Burn hard subtitles from UTF-8 SRT files using moviepy 2.x with CJK-capable system fonts; tune font size, placement, stroke, and encode settings (bitrate or CRF) to avoid oversized outputs. Documents ffprobe/ffmpeg workflows for inspection, encoding, and batch jobs; troubleshooting for fonts, bitrate, and pacing. Covers voiceover with edge-tts (voice selection, rate/volume/pitch), matching narration length to video with atempo/apad, and multi-scene pacing with breathing room. Targets moviepy 2.x and Python 3.x on macOS, Linux, and Windows.

inclusionAI

content-media

open

media

1.2K

embedded-video-pip-smooth-playback

Prevent stutter and frozen frames when embedding a child video inside a parent in code-driven pipelines (Remotion, After Effects scripting, FFmpeg filter graphs). Explains why sparse keyframes break frame-accurate seek during per-frame export, and how re-encoding with H.264 all-intra GOP (-g 1) and yuv420p makes every frame independently decodable. Includes FFmpeg command, parameter notes, file-size tradeoffs, and a reusable rule for any seek-heavy programmatic video workflow.

inclusionAI

content-media

open

media

1.2K

html-to-image

HTML 转图片 skill - 将 HTML 文件或内容通过 agent-browser 渲染并截图为图片。适用于生成信息图、社交媒体配图、数据可视化截图等场景。

inclusionAI

content-media

open

media

1.2K

songsee

Generate spectrograms and audio feature visualizations (mel, chroma, MFCC, tempogram, etc.) from audio files via CLI. Useful for audio analysis, music production debugging, and visual documentation.

math-inc

content-media

open

media

1.2K

ascii-video

Production pipeline for ASCII art video — any format. Converts video/audio/images/generative input into colored ASCII character video output (MP4, GIF, image sequence). Covers: video-to-ASCII conversion, audio-reactive music visualizers, generative ASCII art animations, hybrid video+audio reactive, text/lyrics overlays, real-time terminal rendering. Use when users request: ASCII video, text art video, terminal-style video, character art animation, retro text visualization, audio visualizer in ASCII, converting video to ASCII art, matrix-style effects, or any animated ASCII output.

math-inc

content-media

open

media

1.2K

extract-video-frames

Extracts frames and timestamped audio segments from video files (GIF, MP4, MOV) at configurable intervals and stores them in a directory with a manifest file. Use when analyzing video content, preparing frames for visual review, extracting audio for transcription, or creating frame+audio sequences for another agent to process.

qdhenry

content-media

open

media

1.2K

elevenlabs-transcribe

Transcribes audio/video files using ElevenLabs Scribe v2 API. Use when transcribing audio files, generating transcripts, or converting speech to text.

qdhenry

content-media

open

media

1.1K

typed-ffmpeg-usage

Guide for using typed-ffmpeg, a modern Python FFmpeg wrapper with extensive typing support and comprehensive filter support. Use this when working with FFmpeg operations, video/audio processing, or filter graphs in Python.

livingbio

content-media

open

media

1.1K

image-analysis

Analyze local images using vision-capable LLM. Use when the user question depends on visual content from a local image file — visual question answering, describing images, reading text in images, identifying objects, etc.

Memento-Teams

content-media

open

media

1.1K

media-downloader

Download videos and audio from 1500+ websites including YouTube, Bilibili, TikTok, Twitter/X, Instagram, Vimeo, and more using yt-dlp. Use when the user wants to download videos, save media from social platforms, download with specific resolution (720p/1080p/4K), get subtitles, or download entire playlists. Triggers on requests like 'download this video', 'save this YouTube video', 'download in 1080p', 'download with subtitles', 'download this playlist'.

dp-archive

content-media

open

media

1.1K

audio-extractor

Extract audio from videos and download audio-only content from 1500+ websites using yt-dlp. Converts to MP3, M4A, FLAC, WAV, or OPUS with embedded metadata and cover art. Use when the user wants to extract audio from videos, download podcasts, download music from YouTube/SoundCloud/Bandcamp, convert video to audio, or batch download playlist audio. Triggers on requests like 'extract audio', 'download as MP3', 'get the audio from this video', 'download this podcast', 'download music', 'convert to FLAC'.

dp-archive

content-media

open

media

1.1K

whisper

Transcribe audio files to text using OpenAI Whisper

trpc-group

content-media

open

media

1.1K

video-toolkit

Create professional videos autonomously using claude-code-video-toolkit — AI voiceovers, image generation, music, talking heads, and Remotion rendering.

calesthio

content-media

open

media

1.1K

ltx2

AI video generation with LTX-2.3 22B — text-to-video, image-to-video clips for video production. Use when generating video clips, animating images, creating b-roll, animated backgrounds, or motion content. Triggers include video generation, animate image, b-roll, motion, video clip, text-to-video, image-to-video.

calesthio

content-media

open

Page 9 / 62