category focus

Media

Audio, video, and image processing.

1476 skillsall categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
media
50

add-diffusion-model

Add a new diffusion model (text-to-image, text-to-video, image-to-video, text-to-audio, image editing) to vLLM-Omni, including Cache-DiT acceleration and parallelism support (TP, SP/USP, CFG-Parallel, HSDP). Use when integrating a new diffusion model, porting a diffusers pipeline or a custom model repo to vllm-omni, creating a new DiT transformer adapter, adding diffusion model support, or enabling multi-GPU parallelism and cache acceleration for an existing model.

hsliuustc0106
hsliuustc0106
content-media
open
media
50

vllm-omni-audio-tts

Generate audio and speech with vLLM-Omni using Qwen3-TTS, Fish Speech S2 Pro, CosyVoice3, MiMo-Audio, and Stable-Audio models. Use when synthesizing speech from text, generating audio effects or music, configuring TTS parameters, cloning voices, adding new TTS models, or working with text-to-speech models.

hsliuustc0106
hsliuustc0106
content-media
open
media
50

vllm-omni-multimodal

Transcribe speech, generate images from prompts, analyze video content, and convert between modalities using multimodal omni-modality models like Qwen2.5-Omni and Qwen3-Omni. Use when working with multimodal models for speech recognition, image generation, video understanding, voice synthesis, or any task combining text, image, audio, and video inputs and outputs simultaneously.

hsliuustc0106
hsliuustc0106
content-media
open
media
50

vllm-omni-recipe

Use when adding a recipe for omnimodal models (text-to-image, text-to-video, text-to-audio, image-to-video, any-to-any, diffusion transformers) to the vLLM recipes repository, or documenting vLLM-Omni deployment

hsliuustc0106
hsliuustc0106
content-media
open
media
50

document-to-narration

Convert written documents to narrated video scripts with TTS audio and word-level timing. Use when preparing essays, blog posts, or articles for video narration. Outputs scene files, audio, and VTT with precise word timestamps. Keywords: narration, voiceover, TTS, scenes, audio, timing, video script, spoken.

jwynia
jwynia
content-media
open
media
49

media-processor

Process multimedia content — audio transcription, video analysis, PDF data extraction, image generation. Use for deeper image analysis when implementing from UI designs, analyzing charts for data, reading dense screenshots, or studying artworks and visual references.

avibebuilder
avibebuilder
content-media
open
media
49

handling-attachments

File attachment handling for XMTP agents. Use when sending or receiving images, files, or any encrypted remote attachments. Triggers on file upload, image sending, or remote attachment handling.

xmtplabs
xmtplabs
content-media
open
media
48

alibabacloud-pds-multimodal-search

Implements exact filename search, fuzzy filename search, semantic file search, and image-based image search Triggers: "PDS drive file search", "PDS image search by image"

aliyun
aliyun
content-media
open
media
48

alibabacloud-video-forge

Alibaba Cloud Media Processing Service (MPS) one-stop video processing skill. Use when users need video processing, transcoding, snapshot generation, content moderation, or video upload. For video distribution scenarios, complete video upload, snapshot, multi-resolution transcoding, and content moderation in a single workflow for efficient standardized video asset production.

aliyun
aliyun
content-media
open
media
48

alibabacloud-video-editor

Video editing tool that requires no ffmpeg installation. All video processing is executed in the cloud - no local ffmpeg installation needed. If both input and output are URLs or Alibaba Cloud OSS, this skill is the preferred choice. Can generate Timeline configuration based on editing requirements and material information, submit Alibaba Cloud editing tasks, wait for task completion, and output the final video URL. Use when the user wants to edit videos, mentions video editing, clipping, 剪辑,视频制作,视频拼接,视频合成,or needs to process media files into videos.

aliyun
aliyun
content-media
open
media
48

video-frames

Extract frames or short clips from videos using ffmpeg.

2001Haru
2001Haru
content-media
open
media
47

video-creator

Orchestrates end-to-end video generation through sequential workflow steps (audio, direction, assets, design, coding). Activates when user requests video creation from a script, wants to resume video generation, mentions "create video", "generate video", or "video workflow", requests running a specific step (audio, direction, assets, design, coding), asks to "create audio", "generate direction", "create assets", "generate design", or "code video components", or wants to resume a video. Manages workflow state tracking and parallel scene generation.

outscal
outscal
content-media
open
media
47

video-designer

Expert video designer that generates comprehensive design specifications based on video direction. Creates precise JSON schemas for scenes including elements, animations, timing, and styling following strict design guidelines.

outscal
outscal
content-media
open
media
47

ffmpeg-cli

FFmpeg CLI reference for video and audio processing, format conversion, filtering, and media automation. Use when converting video formats, resizing or cropping video, trimming by time, replacing or extracting audio, mixing audio tracks, overlaying text or images, burning subtitles, creating GIFs, generating thumbnails, building slideshows, changing playback speed, encoding with H264/H265/VP9, setting CRF/bitrate, using GPU acceleration, creating storyboards, or running ffprobe. Covers filter_complex, stream selectors, -map, -c copy, seeking, scale, pad, crop, concat, drawtext, zoompan, xfade.

henkisdabro
henkisdabro
content-media
open
media
47

image-enhancer

Improves the quality of images, especially screenshots, by enhancing resolution, sharpness, and clarity. Perfect for preparing images for presentations, documentation, or social media posts.

jiaxiaojunQAQ
jiaxiaojunQAQ
content-media
open
media
47

shorts

Interactive longform-to-shortform video creator. Extracts viral-ready short clips from long videos using Claude as the orchestrator. Transcribes with faster-whisper (GPU), Claude scores and presents candidate segments interactively, user picks and adjusts, Remotion renders premium animated captions (Bold/Bounce/Clean styles), FFmpeg exports platform-optimized files (YouTube Shorts, TikTok, Instagram Reels). Use when user says "shorts", "short clips", "shortform", "extract clips", "tiktok from video", "reels from video", "vertical clips", or "create shorts".

AgriciDaniel
AgriciDaniel
content-media
open
media
47

deapi

AI media generation via deAPI. Transcribe YouTube/audio/video, generate images from text, text-to-speech, OCR, remove backgrounds, upscale images, create videos, generate embeddings. 10-20x cheaper than OpenAI/Replicate.

eric861129
eric861129
content-media
open
media
47

image-seo-audit

Audit image SEO. Use when: checking alt text, file sizes, WebP/AVIF formats, lazy loading, or responsive images.

indranilbanerjee
indranilbanerjee
content-media
open
media
46

image-enhancement-suite

Process images for cleanup, conversion, metadata, comparison, icons, palettes, collages, and sprite sheets. Use for single-image or batch image workflows.

dkyazzentwatwa
dkyazzentwatwa
content-media
open
media
46

fal-image-edit

Edit images using AI on fal.ai. Style transfer, object removal, background changes, and more. Use when the user requests "Edit image", "Remove object", "Change background", "Apply style", or similar image editing tasks.

fal-ai-community
fal-ai-community
content-media
open
media
46

fal-restore

Restore and fix image quality — deblur, denoise, dehaze, fix faces, restore documents. Use when the user requests "Fix blurry image", "Remove noise", "Fix face", "Restore photo", "Enhance document", "Deblur", "Denoise".

fal-ai-community
fal-ai-community
content-media
open
media
46

fal-kling-o3

Generate images and videos with Kling O3 — Kling's most powerful model family. Text-to-image, text-to-video, image-to-video, and video-to-video editing. Use when the user requests "Kling", "Kling O3", "Best quality video", "Kling image", "Kling video editing".

fal-ai-community
fal-ai-community
content-media
open
media
46

fal-upscale

Upscale and enhance image resolution using AI. Use when the user requests "Upscale image", "Enhance resolution", "Make image bigger", "Increase quality", or similar upscaling tasks.

fal-ai-community
fal-ai-community
content-media
open
media
46

fal-video-edit

Edit existing videos using AI — remix style, edit content, upscale resolution, remove background, or add audio/sound effects. Use when the user requests "Edit video", "Remix video", "Upscale video", "Remove video background", "Add sound to video", "Video to audio".

fal-ai-community
fal-ai-community
content-media
open
Previous
Page 30 / 62
Next