home/categories/media

category focus

Media

Audio, video, and image processing.

1476 個技能all categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

media

32.1K

videodb-skills

Upload, stream, search, edit, transcribe, and generate AI video and audio using the VideoDB SDK.

sickn33

content-media

open

media

29.3K

image-manipulation-image-magick

Process and manipulate images using ImageMagick. Supports resizing, format conversion, batch processing, and retrieving image metadata. Use when working with images, creating thumbnails, resizing wallpapers, or performing batch image operations.

github

content-media

open

media

29.3K

java-add-graalvm-native-image-support

GraalVM Native Image expert that adds native image support to Java applications, builds the project, analyzes build errors, applies fixes, and iterates until successful compilation using Oracle best practices.

github

content-media

open

media

29.3K

transloadit-media-processing

Process media files (video, audio, images, documents) using Transloadit. Use when asked to encode video to HLS/MP4, generate thumbnails, resize or watermark images, extract audio, concatenate clips, add subtitles, OCR documents, or run any media processing pipeline. Covers 86+ processing robots for file transformation at scale.

github

content-media

open

media

27.1K

add-image-vision

Add image vision to NanoClaw agents. Resizes and processes WhatsApp image attachments, then sends them to Claude as multimodal content blocks.

qwibitai

content-media

open

media

21.5K

weixin-file-send

Use when the user wants a local file or image sent back, such as "send me the file" or "发给我".

iOfficeAI

content-media

open

media

19.6K

hyper-fast-youtube-transcript

Use when the user wants a YouTube transcript from a single URL or video ID. Optimized for one input and one output: fetch the transcript fast, default to plain transcript text only, and avoid extra commentary unless the user asks for timestamps, JSON, or metadata. Triggers on: youtube transcript, transcript from this video, get captions, extract transcript from YouTube, summarize this YouTube transcript after fetching it.

kortix-ai

content-media

open

media

19.6K

whisper

Transcribe any audio or video file to text using Whisper (Groq or OpenAI). Use when the agent receives voice messages, audio files, video messages, or any media with speech. Triggers on: 'transcribe', 'what does this say', 'voice message', 'speech to text', 'audio', any file path ending in .ogg .mp3 .mp4 .wav .webm .m4a .flac .oga .oga

kortix-ai

content-media

open

media

18.2K

camsnap

Capture frames or clips from RTSP/ONVIF cameras. Grabs snapshots, video clips, and motion events from IP cameras, security cameras, and video streams. Use when the user wants to take a snapshot from a camera, record a clip from an RTSP stream, monitor motion on a security camera, discover ONVIF devices on the network, or configure camera access for automated surveillance capture.

elizaOS

content-media

open

media

18.2K

nano-banana-pro

Generate or edit images via Gemini 3 Pro Image (Nano Banana Pro). Use when the user asks to create an image, generate a picture, produce AI-generated artwork, edit a photo, compose multiple images, or upscale an image to higher resolution. Supports text-to-image generation, single-image editing, and multi-image composition using the Gemini API.

elizaOS

content-media

open

media

18.2K

video-frames

Extract frames or short clips from videos using ffmpeg. Use when the user asks to grab a frame, capture a screenshot from a video, extract a thumbnail, pull a still image from footage, or snapshot a specific timestamp in a video file.

elizaOS

content-media

open

media

17.6K

video-downloader

Downloads videos from YouTube and other platforms for offline viewing, editing, or archival. Handles various formats and quality options.

davila7

content-media

open

media

16.5K

clip-hand-skill

Expert knowledge for AI video clipping — yt-dlp downloading, whisper transcription, SRT generation, and ffmpeg processing

RightNow-AI

content-media

open

media

14.1K

remotion-best-practices

Best practices for Remotion - Video creation in React

vercel-labs

content-media

open

media

14.1K

baoyu-compress-image

Compresses images to WebP (default) or PNG with automatic tool selection. Use when user asks to "compress image", "optimize image", "convert to webp", or reduce image file size.

JimLiu

content-media

open

media

14K

videocaptioner

Process video subtitles — transcribe speech, optimize/translate text, burn styled subtitles into video. Use when you need to add subtitles to a video, transcribe audio, translate subtitles, or customize subtitle styles.

WEIFENG2333

content-media

open

media

11.3K

audioeditor

AI-powered audio/video editing — transcription, intelligent cut detection, automated editing with crossfades, and optional cloud polish. USE WHEN clean audio, edit audio, remove filler words, clean podcast, remove ums, fix audio, cut dead air, polish audio, clean recording, transcribe and edit.

danielmiessler

content-media

open

media

10.6K

analyzing-videos

Analyzes video content and extracts highlights. Use when user wants to analyze video, extract highlights, create video summary, generate video keywords, understand video content, find best moments, create trailer, extract exciting clips, get video insights, or identify viral moments. 视频分析、提取精彩片段、视频摘要、视频理解、精彩集锦、视频关键词、剪辑精华、内容分析、热门片段。

yikart

content-media

open

media

10.6K

composing-videos

Combines multiple videos/images into a single video with optional background audio. Use when user wants to merge clips, concatenate videos, create slideshow from images, stitch videos together, combine media files, add background music to video, mix video with audio, create video montage, or join multiple video segments. 合并视频、拼接视频、图片合成视频、添加背景音乐、视频拼接、多图生成视频、视频混剪、素材合成。

yikart

content-media

open

media

10.6K

editing-images

Image editing using Sharp. Supports compositing (QR codes, logos, watermarks), resizing, cropping, rotating, flipping, brightness/contrast/saturation adjustment, blur, sharpen. 图片编辑、图片合成、添加二维码、添加Logo、添加水印、图片缩放、图片裁剪、图片旋转、图片翻转、亮度对比度饱和度调整、模糊、锐化。

yikart

content-media

open

media

10.6K

editing-videos

Video editing using Volcengine Track structure. Supports cutting, trimming, adding text, stickers, audio, filters, effects, transitions, multi-clip compositions, speed adjustment, watermark removal. 视频剪辑、裁剪视频、添加文字、添加水印、添加音频、视频滤镜、视频特效、视频转场、多片段拼接、调整速度、去水印。

yikart

content-media

open

media

10.6K

extracting-thumbnails

Extracts thumbnails from video URLs. Use when publishing video content that requires a cover image, when the user does not provide a cover/thumbnail, or before publishing to platforms that require cover images (Kwai, Bilibili, YouTube). 提取封面、视频封面、生成封面、缩略图提取、视频缩略图、截取封面。

yikart

content-media

open

media

10.6K

generating-videos

Generates videos using Grok (preferred) and Google Veo 3.1 models. Supports text-to-video, image-to-video, first-last-frame, video extension, and reference images. AI视频生成、文生视频、图生视频、首尾帧生成、视频拓展。

yikart

content-media

open

media

10.6K

removing-subtitles

Removes hardcoded subtitles from videos using AI inpainting. Use when user wants to remove subtitles, erase text from video, clean video from captions, delete burned-in subtitles, remove video watermarks, clean hardcoded text, or strip embedded subtitles. 去字幕、去除字幕、删除字幕、清除字幕、去硬字幕、去水印、擦除字幕、移除字幕。

yikart

content-media

open

Page 2 / 62