home/categories/media

category focus

Media

Audio, video, and image processing.

1476 skillsall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

media

381

pdf-to-video

Use when user wants to convert a PDF document into a showcase video, extract key points from PDF, or create video presentation from PDF file

DangJin

content-media

open

media

377

alicloud-ai-video-wan-video-test

Minimal video generation smoke test for Model Studio Wan Video.

cinience

content-media

open

media

377

Use when editing videos with DashScope Wan 2.7 video editing model (wan2.7-videoedit). Use when implementing video style transfer, instruction-based video editing with optional reference images, or video content modification via the video-synthesis async API.

cinience

content-media

open

media

377

aliyun-videoretalk

Use when replacing lip sync in existing videos with Alibaba Cloud Model Studio VideoRetalk (`videoretalk`). Use when creating dubbed videos, replacing narration, or synchronizing a talking-head video to a new speech track.

cinience

content-media

open

media

377

aliyun-wan-i2v

Use when generating videos from images with DashScope Wan 2.7 image-to-video model (wan2.7-i2v). Use when implementing first-frame video generation, first+last frame interpolation, video continuation, or audio-driven video synthesis via the video-synthesis async API.

cinience

content-media

open

media

377

aliyun-kling-video

Use when generating videos with Kling v3 models on DashScope (kling/kling-v3-video-generation, kling/kling-v3-omni-video-generation). Use when implementing text-to-video, image-to-video, reference-to-video, smart storyboard, or video editing via the video-synthesis async API.

cinience

content-media

open

media

377

aliyun-video-style-repaint

Use when transforming video style with DashScope video-style-transform model. Use when converting videos to artistic styles such as Japanese manga, American comics, 3D cartoon, Chinese ink painting, paper art, or simple illustration via the video-synthesis async API.

cinience

content-media

open

media

377

aliyun-qwen-asr

Use when transcribing non-realtime speech with Alibaba Cloud Model Studio Qwen ASR models (`qwen3-asr-flash`, `qwen-audio-asr`, `qwen3-asr-flash-filetrans`). Use when converting recorded audio files to text, generating transcripts with timestamps, or documenting DashScope/OpenAI-compatible ASR request and response fields.

cinience

content-media

open

media

365

videocut

执行视频剪辑。根据确认的删除任务执行FFmpeg剪辑，循环直到零口误，生成字幕。触发词：执行剪辑、开始剪、确认剪辑

Ceeon

content-media

open

media

359

media-downloader

智能媒体下载器。根据用户描述自动搜索和下载图片、视频片段，支持视频自动剪辑。 Smart media downloader. Automatically search and download images/video clips based on user description, with auto-trimming support. 触发方式 Triggers: "下载图片", "找视频", "download media", "download images", "find video", "/media"

yizhiyanhua-ai

content-media

open

media

349

media

Play audio files, record from microphone, take photos. Use for media playback, voice recording, or camera capture.

mikeyobrien

content-media

open

media

336

transcribe-audio

Transcribes video audio using WhisperX, preserving original timestamps. Creates JSON transcript with word-level timing. Use when you need to generate audio transcripts for videos.

barefootford

content-media

open

media

336

roughcut

Creates video rough cut yaml file for use with Buttercut gem. Concatenates visual transcripts with file markers, creates a roughcut yaml with clip selections, then exports to XML format. Use this skill when users want a "roughcut", "sequence" or "scene" generated. These are all the same thing, just with different lengths.

barefootford

content-media

open

media

336

analyze-video

Adds visual descriptions to transcripts by extracting and analyzing video frames with ffmpeg. Creates visual transcript with periodic visual descriptions of the video clip. Use when all files have audio transcripts present (transcript) but don't yet have visual transcripts created (visual_transcript).

barefootford

content-media

open

media

328

gsap-animation

GSAP + Remotion integration for professional motion graphics video production. Timeline orchestration, text splitting, SVG morphing, advanced easing, and reusable effect presets.

notedit

content-media

open

media

325

video-optimization

When the user wants to optimize videos for Google Search, video sitemap, VideoObject schema, or video SEO on websites. Also use when the user mentions "video SEO," "video sitemap," "VideoObject," "video thumbnail," "video indexing," "video preview," "key moments," "Clip schema," or "embedded video optimization." For page template, use article-page-generator.

kostja94

content-media

open

media

325

youtube-seo

When the user wants to optimize YouTube videos for search, create video descriptions, or improve channel visibility. Also use when the user mentions "YouTube SEO," "YouTube description," "YouTube tags," "YouTube thumbnail," "YouTube title," "YouTube channel," or "video SEO." For YouTube ads, use youtube-ads.

kostja94

content-media

open

media

325

image-optimization

When the user wants to optimize images for search engines and performance. Also use when the user mentions "image SEO," "alt text," "image captions," "figcaption," "image optimization," "WebP," "lazy loading," "LCP," "image sitemap," "responsive images," "srcset," "image format," or "hero image optimization." For CWV, use core-web-vitals.

kostja94

content-media

open

media

322

image-to-video

Still-to-video conversion guide: model selection, motion prompting, and camera movement. Covers Wan 2.5 i2v, Seedance, Fabric, Grok Video with when to use each. Use for: animating images, creating video from stills, adding motion, product animations. Triggers: image to video, i2v, animate image, still to video, add motion to image, image animation, photo to video, animate still, wan i2v, image2video, bring image to life, animate photo, motion from image

inference-sh

content-media

open

media

322

p-video

Generate videos with Pruna P-Video and WAN models via inference.sh CLI. Models: P-Video, WAN-T2V, WAN-I2V. Capabilities: text-to-video, image-to-video, audio support, 720p/1080p, fast inference. Pruna optimizes models for speed without quality loss. Triggers: pruna video, p-video, pruna ai video, fast video generation, optimized video, wan t2v, wan i2v, economic video generation, cheap video generation, pruna text to video, pruna image to video

inference-sh

content-media

open

media

319

video-editing

Video editing pipeline: cut footage, assemble clips via FFmpeg and Remotion.

notque

content-media

open

media

319

image-to-video

FFmpeg-based video creation from image and audio.

notque

content-media

open

media

304

baoyu-compress-image

Compresses images to WebP (default) or PNG with automatic tool selection. Use when user asks to "compress image", "optimize image", "convert to webp", or reduce image file size.

ECNU-ICALK

content-media

open

media

304

audio-transcription

Transcribe audio and video files into structured notes. Activate this skill when users want to transcribe recordings, meetings, podcasts, voice memos, or any audio/video content in their vault.

allenhutchison

content-media

open

Page 14 / 62