home/categories/media

category focus

Media

Audio, video, and image processing.

1476 個技能all categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

media

565

assemblyai-transcribe

Transcribe audio/video with AssemblyAI (local upload or URL), plus subtitles + paragraph/sentence exports.

sundial-org

content-media

open

media

565

Download videos, audio, subtitles, and clean paragraph-style transcripts from YouTube and any other yt-dlp supported site. Use when asked to “download this video”, “save this clip”, “rip audio”, “get subtitles”, “get transcript”, or to troubleshoot yt-dlp/ffmpeg and formats/playlists.

sundial-org

content-media

open

media

565

video-subtitles

Generate SRT subtitles from video/audio with translation support. Transcribes Hebrew (ivrit.ai) and English (whisper), translates between languages, burns subtitles into video. Use for creating captions, transcripts, or hardcoded subtitles for WhatsApp/social media.

sundial-org

content-media

open

media

565

venice-ai-media

Generate, edit, and upscale images; create videos from images or other videos via Venice AI. Supports text-to-image, image-to-video (Sora, WAN), video-to-video (Runway Gen4), upscaling, and AI editing.

sundial-org

content-media

open

media

565

clipit

The master tool for all advanced audio/video processing. Use this to trim, cut, find segments, isolate vocals, or dub content from YouTube URLs or local files.

sundial-org

content-media

open

media

565

remotion-best-practices-2

Best practices for Remotion - Video creation in React

sundial-org

content-media

open

media

553

dplug-framework

Create beautiful audio plug-ins using the D language. Process audio files using existing VST2 plugins. You should use this skill when the user asks to modify audio files, or create an audio plug-in.

AuburnSounds

content-media

open

media

544

nano-banana-pro

Generate or edit images via Gemini 3 Pro Image (Nano Banana Pro).

smallnest

content-media

open

media

544

video-frames

Extract frames or short clips from videos using ffmpeg.

smallnest

content-media

open

media

538

nvenc-nvdec

NVIDIA hardware video encoding/decoding integration. Configure NVENC encoding parameters, set up NVDEC decoding pipelines, handle codec configurations, integrate with CUDA for pre/post processing, and manage video memory surfaces.

a5c-ai

content-media

open

media

538

video-marketing

Video platform optimization and analytics for YouTube and other video channels

a5c-ai

content-media

open

media

517

node-minify

Compress JavaScript, CSS, HTML, JSON, and image files using node-minify library. Use when: minifying/compressing assets, bundling JS/CSS files, optimizing images (WebP/AVIF), concatenating files, or when user mentions "node-minify", "@node-minify", "minification". Triggers: "minify", "compress JS/CSS", "bundle", "optimize images", "reduce file size".

srod

content-media

open

media

503

youtube-downloader

Download YouTube videos with customizable quality and format options. Use this skill when the user asks to download, save, or grab YouTube videos. Supports various quality settings (best, 1080p, 720p, 480p, 360p), multiple formats (mp4, webm, mkv), and audio-only downloads as MP3.

wlzh

content-media

open

media

503

audiocut-keyword

音频关键字过滤工具 - 根据关键字配置自动识别并删除音频中的指定内容

wlzh

content-media

open

media

503

voice-changer

音频变声处理工具 - 使用 RVC AI 模型进行真实的声音转换，支持视频直接输入

wlzh

content-media

open

media

503

youtube-to-blog-post

Convert YouTube videos to SEO-optimized blog posts. Extract video title, description, and content, then generate a search-engine-friendly blog post with embedded video, cover images, and optimized metadata. Auto-generates English filenames and saves to the configured Hexo blog posts directory.

wlzh

content-media

open

media

456

ifly-speed-transcription

Ultra-fast speech transcription using iFLYTEK Speed Transcription API. Transcribe audio files (WAV/PCM/MP3) up to 5 hours in ~20 seconds per hour. Supports Chinese, English, and 202+ Chinese dialects with automatic language detection. Use when user asks to transcribe audio files, convert speech to text, or mentions "speed transcription" or "极速转写".

iflytek

content-media

open

media

454

video-translation

Translate and dub videos from one language to another, replacing the original audio with TTS while keeping the video intact.

NoizAI

content-media

open

media

422

youtube-apify-transcript

Fetch YouTube transcripts via APIFY API. Works from cloud IPs (Hetzner, AWS, etc.) by bypassing YouTube's bot detection. Free tier includes $5/month credits (~714 videos). No credit card required.

gooseworks-ai

content-media

open

media

412

camsnap

Capture frames or clips from RTSP/ONVIF cameras.

understudy-ai

content-media

open

media

412

nano-banana-pro

Generate or edit images via Gemini 3 Pro Image (Nano Banana Pro).

understudy-ai

content-media

open

media

412

video-frames

Extract frames or short clips from videos using ffmpeg.

understudy-ai

content-media

open

media

405

video-podcast-maker

Use when user provides a topic and wants an automated video podcast created, OR when user wants to learn/analyze video design patterns from reference videos — handles research, script writing, TTS audio synthesis, Remotion video creation, and final MP4 output with background music. Also supports design learning from reference videos (learn command), style profile management, and design reference library. Supports Bilibili, YouTube, Xiaohongshu, Douyin, and WeChat Channels platforms with independent language configuration (zh-CN, en-US).

Agents365-ai

content-media

open

media

389

nextjs-optimization

Optimize images, fonts, scripts, and metadata for Next.js performance and Core Web Vitals. Use when configuring next/image for LCP, next/font for zero layout shift, next/script loading strategies, or generateMetadata for SEO. (triggers: **/layout.tsx, **/page.tsx, next/image, next/font, metadata, generateMetadata)

HoangNguyen0403

content-media

open

Page 13 / 62