category focus

Media

Audio, video, and image processing.

1476 スキルall categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
media
26

download-video

Downloads embedded videos from web pages. Fetches the page, identifies the video hosting service (Vimeo, YouTube, etc.), resolves the correct embed/player URL, and downloads using yt-dlp. Handles private/unlisted videos that require referer headers or embed URLs. Use this skill when someone says "download this video", "save this video", "grab the video from this page", "rip this video", or provides a URL and asks to download media from it. Also trigger when someone pastes a URL to a page with an embedded video and wants the video file locally.

swyxio
swyxio
content-media
open
media
26

multimodal-extraction

Given a local video or video URL, downloads the media if needed, extracts slide frames and key moments, transcribes the audio, and writes a Markdown timeline that interleaves screenshots with the transcript at the associated timestamps. Use when asked to turn a video into a multimodal notes file, slide-synced transcript, screenshot-enhanced transcript, or talk recap with images.

swyxio
swyxio
content-media
open
media
26

embed-subtitles

Burn subtitles onto videos using FFmpeg. Use for: hardcode subtitles, embed captions, video subtitling.

aviz85
aviz85
content-media
open
media
26

nano-banana-pro

使用谷歌的 Nano Banana Pro(Gemini 3 Pro 图片)API 生成和编辑图片。当用户请求生成、创建、编辑、修改、修改、修改或更新图像时使用。当用户引用已有的图片文件并请求以任何方式修改时,也要使用(例如,“修改此图像”、“更改背景”、“用 Y 替换 X”)。支持文本生成和图像对图像编辑,分辨率可配置(默认 1K,高分辨率为 2K 或 4K)。千万不要先读图片文件——直接用这个技能配合 --input-image 参数。

open-deep-crew
open-deep-crew
content-media
open
media
26

processing-computer-vision-tasks

Process images using object detection, classification, and segmentation. Use when requesting "analyze image", "object detection", "image classification", or "computer vision". Trigger with relevant phrases based on skill purpose.

ComeOnOliver
ComeOnOliver
content-media
open
media
26

elevenlabs-core-workflow-b

Implement ElevenLabs speech-to-speech, sound effects, audio isolation, and speech-to-text. Use when converting voice to another voice, generating sound effects from text, removing background noise, or transcribing audio. Trigger: "elevenlabs speech to speech", "voice changer", "sound effects", "audio isolation", "remove background noise", "elevenlabs transcribe".

ComeOnOliver
ComeOnOliver
content-media
open
media
26

nextjs-optimization

Image, Font, Script, and Metadata optimization strategies. Use when optimizing Next.js images, fonts, scripts, or page metadata for performance. (triggers: **/layout.tsx, **/page.tsx, next/image, next/font, metadata, generateMetadata)

ComeOnOliver
ComeOnOliver
content-media
open
media
26

nano-banana-pro-openrouter

Generate or edit images via OpenRouter with the Gemini 3 Pro Image model. Use for prompt-only image generation, image edits, and multi-image compositing; supports 1K/2K/4K output.

ComeOnOliver
ComeOnOliver
content-media
open
media
26

youtube-downloader

Download YouTube videos and HLS streams (m3u8) from platforms like Mux, Vimeo, etc. using yt-dlp and ffmpeg. Use this skill when users request downloading videos, extracting audio, handling protected streams with authentication headers, or troubleshooting download issues like nsig extraction failures, 403 errors, or cookie extraction problems.

ComeOnOliver
ComeOnOliver
content-media
open
media
26

asr-transcribe-to-text

Transcribe audio and video files to text using a remote ASR service (Qwen3-ASR or OpenAI-compatible endpoint). Extracts audio from video, sends to configurable ASR endpoint, outputs clean text. Use when the user wants to transcribe recordings, convert audio/video to text, do speech-to-text, or mentions ASR, Qwen ASR, 转录, 语音转文字, 录音转文字, or has a meeting recording, lecture, interview, or screen recording to transcribe.

ComeOnOliver
ComeOnOliver
content-media
open
media
26

image-enhancer

Improves the quality of images, especially screenshots, by enhancing resolution, sharpness, and clarity. Perfect for preparing images for presentations, documentation, or social media posts.

ComeOnOliver
ComeOnOliver
content-media
open
media
26

axiom-photo-library

PHPicker, PhotosPicker, photo selection, limited library access, presentLimitedLibraryPicker, save to camera roll, PHPhotoLibrary, PHAssetCreationRequest, Transferable, PhotosPickerItem, photo permissions

ComeOnOliver
ComeOnOliver
content-media
open
media
26

media-downloader

智能媒体下载器。根据用户描述自动搜索和下载图片、视频片段,支持视频自动剪辑。 Smart media downloader. Automatically search and download images/video clips based on user description, with auto-trimming support. 触发方式 Triggers: "下载图片", "找视频", "download media", "download images", "find video", "/media"

ComeOnOliver
ComeOnOliver
content-media
open
media
26

video-enhancement

AI Video Enhancement - Upscale video resolution, improve quality, denoise, sharpen, enhance low-quality videos to HD/4K. Supports local video files, remote URLs (YouTube, Bilibili), auto-download, real-time progress tracking.

ComeOnOliver
ComeOnOliver
content-media
open
media
26

faceswap

AI Face Swap - Swap face in video, deepfake face replacement, face swap for portraits. Use from command line. Supports local video files, YouTube, Bilibili URLs, auto-download, real-time progress tracking.

ComeOnOliver
ComeOnOliver
content-media
open
media
26

mulerouter

Generates images and videos using MuleRouter or MuleRun multimodal APIs. Text-to-Image, Image-to-Image, Text-to-Video, Image-to-Video, video editing (VACE, keyframe interpolation). Use when the user wants to generate, edit, or transform images and videos using AI models like Wan2.6 or Nano Banana.

ComeOnOliver
ComeOnOliver
content-media
open
media
26

videocut

执行视频剪辑。根据确认的删除任务执行FFmpeg剪辑,循环直到零口误,生成字幕。触发词:执行剪辑、开始剪、确认剪辑

ComeOnOliver
ComeOnOliver
content-media
open
media
26

conference-transcribe

Transcribe a multi-talk conference livestream or long YouTube video into separate per-talk transcripts. Parses timestamps from the video description to split talks, downloads audio/video, transcribes each segment, then uses an LLM to clean up and format the transcripts with key takeaways and frequent timestamps. Use when user says "transcribe this conference", "split this livestream into talks", "transcribe each talk separately", or provides a YouTube URL of a multi-hour event stream with chapter timestamps.

swyxio
swyxio
content-media
open
media
26

videodb

See, Understand, Act on video and audio. See- ingest from local files, URLs, RTSP/live feeds, or live record desktop; return realtime context and playable stream links. Understand- extract frames, build visual/semantic/temporal indexes, and search moments with timestamps and auto-clips. Act- transcode and normalize (codec, fps, resolution, aspect ratio), perform timeline edits (subtitles, text/image overlays, branding, audio overlays, dubbing, translation), generate media assets (image, audio, video), and create real time alerts for events from live streams or desktop capture.

ComeOnOliver
ComeOnOliver
content-media
open
media
26

video-editing

AI-assisted video editing workflows for cutting, structuring, and augmenting real footage. Covers the full pipeline from raw capture through FFmpeg, Remotion, ElevenLabs, fal.ai, and final polish in Descript or CapCut. Use when the user wants to edit video, cut footage, create vlogs, or build video content.

ComeOnOliver
ComeOnOliver
content-media
open
media
26

fal-ai-media

Unified media generation via fal.ai MCP — image, video, and audio. Covers text-to-image (Nano Banana), text/image-to-video (Seedance, Kling, Veo 3), text-to-speech (CSM-1B), and video-to-audio (ThinkSound). Use when the user wants to generate images, videos, or audio with AI.

ComeOnOliver
ComeOnOliver
content-media
open
media
26

see-through-anime-layer-decomposition

Expertise in See-through, a framework for single-image layer decomposition of anime characters into manipulatable 2.5D PSD files using diffusion models.

Aradotso
Aradotso
content-media
open
media
26

filmkit-fujifilm-camera

Browser-based preset manager and RAW converter for Fujifilm X-series cameras using WebUSB and PTP protocol

Aradotso
Aradotso
content-media
open
media
26

video-downloader

Downloads videos from YouTube and other platforms for offline viewing, editing, or archival. Handles various formats and quality options.

christophacham
christophacham
content-media
open
Previous
Page 38 / 62
Next