home/categories/media

category focus

Media

Audio, video, and image processing.

1476 مهارةall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

media

304

video-summarization-via-object-tracking

Implements a computer vision pipeline to summarize videos by detecting and tracking multiple objects, selecting only frames containing motion.

ECNU-ICALK

content-media

open

media

304

c-image-component-reordering-with-deferred-rendering

Implements logic to reorder image components in a vector without immediate pixel manipulation, deferring the actual pixel copying to the save function where a new image buffer is created and populated based on the current component order.

ECNU-ICALK

content-media

open

media

304

png

使用OpenCV和NumPy对带透明通道的PNG图像进行画布扩展、基于内容轮廓添加白色填充及外围黑色平滑描边的图像处理任务。

ECNU-ICALK

content-media

open

media

301

byted-tos-image-process

Provides image processing capabilities for objects in Bytedance TOS using the official SDK. Supports getting image info, format conversion, resizing, and watermarking. Use when you need to analyze or transform images stored in TOS.

bytedance

content-media

open

media

301

video-generate

使用 video_generate.py 脚本生成视频，需要提供文件名和 prompt，可选提供首帧图片（URL或本地路径）。

bytedance

content-media

open

media

301

byted-tos-video-process

Uses Volcengine TOS SDK object processing (e.g., `video/info`, `video/snapshot`) to fetch video metadata and extract single or multiple frame snapshots from videos stored in Bytedance TOS. Use when the user needs video info/metadata, thumbnail or frame capture, snapshot extraction, or mentions TOS video processing.

bytedance

content-media

open

media

301

byted-las-vlm-video

Video content understanding operator (las_vlm_video) via Doubao models. Use this skill when user needs to: - Analyze/describe video content with natural language prompts - Ask questions about what happens in a video (objects, actions, scenes) - Summarize video, extract key events, or generate captions Supports public/intranet-accessible video URLs and returns model responses + compression metadata. Requires LAS_API_KEY for authentication.

bytedance

content-media

open

media

301

byted-mediakit

Volcengine AI MediaKit audio and video processing skill. It is triggered when users need to process or edit audio/video content. After processing, it automatically checks task status and returns playback links for the generated outputs. Core capabilities are grouped into five categories: 1) Video processing: multi-clip stitching, clip trimming, frame flipping, video speed adjustment, audio speed adjustment, image-to-video generation, audio-video composition, audio track extraction, and audio mixing; 2) Audio processing: vocal/accompaniment separation and audio noise reduction; 3) Video enhancement: comprehensive quality restoration, AI super-resolution, and intelligent frame interpolation; 4) AI content analysis: ASR speech-to-text, OCR text extraction, subtitle removal, subtitle embedding, intelligent scene slicing, portrait matting, green screen matting, media info query, and highlight extraction; 5) AI content generation: comic style transfer, AI video translation, AI drama recap narration, and AI drama sc

bytedance

content-media

open

media

301

byted-las-video-resize

Video resolution resize operator (las_video_resize). Use this skill when user needs to: - Resize video resolution into a target range (min/max width/height) - Preserve aspect ratio with increase/decrease/disable strategies - Control encoding quality options for GPU NVENC (cq/rc) Supports input from public URL/intranet URL/TOS and outputs to TOS. If user provides local video files or requires local outputs, use byted-tosfile-access to upload/download as a TOS bridge. Requires LAS_API_KEY for authentication.

bytedance

content-media

open

media

301

byted-las-audio-extract-and-split

Audio extract and split operator. Use this skill when user needs to: - Extract audio from video files (mp4, wmv, etc.) - Split audio into segments of specific duration - Convert audio format (wav, mp3, flac) Supports input from TOS and output to TOS. Requires LAS_API_KEY for authentication.

bytedance

content-media

open

media

301

byted-las-image-resample

Image resampling operator for downsampling images. Use this skill when user needs to: - Resize/downsample images to target size - Change image DPI settings - Convert between JPG/PNG formats Supports 4 interpolation methods: nearest, bilinear, bicubic, lanczos. Supports input from URL, TOS, base64, or binary. Requires LAS_API_KEY for authentication.

bytedance

content-media

open

media

301

video-breakdown

视频分镜拆解技能（自包含，无需外部后端）。使用 FFmpeg 预处理视频并生成分镜数据。(1) 视频 URL 直接处理 `python scripts/process_video.py "<video_url>"`；(2) 本地文件先上传 `python scripts/video_upload.py "<file_path>"` 获取 URL 后再处理。需要本机安装 FFmpeg。

bytedance

content-media

open

media

301

byted-las-audio-convert

Audio format conversion operator. Use this skill when user needs to: - Convert audio files between formats (wav, mp3, flac) - Change audio properties (sample rate, bit rate) using ffmpeg params Supports input from TOS and output to TOS. Requires LAS_API_KEY for authentication.

bytedance

content-media

open

media

301

byted-las-video-edit

Extract video clips from long videos based on natural-language descriptions. Use this skill when user needs to: - Extract highlights or specific scenes from videos - Find specific people/objects in videos using reference images - Split long videos into meaningful clips - Generate video summaries with timestamps Supports reference images for target identification, outputs TOS clip URLs. Requires LAS_API_KEY for authentication.

bytedance

content-media

open

media

301

byted-las-video-inpaint

Video inpainting operator (las_video_inpaint) for removing watermarks/subtitles/logos from videos. Use this skill when user needs to: - Remove watermarks, subtitles, or scrolling subtitles from a video - Repair a video by inpainting fixed regions (fixed_bboxes) or auto-detected regions - Run video restoration and get the output TOS path + optional subtitle bbox Supports input from public URL/intranet URL/TOS and outputs to TOS. If user provides local video files or requires local outputs, use byted-tosfile-access to upload/download as a TOS bridge. Requires LAS_API_KEY for authentication.

bytedance

content-media

open

media

281

uloop-screenshot

Capture screenshots of Unity Editor windows as PNG files. Use when you need to: (1) Screenshot Game View, Scene View, Console, Inspector, or other windows, (2) Capture current visual state for debugging or documentation, (3) Save editor window appearance as image files.

hatayama

content-media

open

media

281

uloop-screenshot

hatayama

content-media

open

media

278

refine

Transform a brief or prompt into a structured, production-ready prompt via prompt-optimizer. File or text mode.

automagik-dev

content-media

open

media

276

ascii-image-to-ascii

“Convert an image into ASCII art (readable + detail variants, width/charset controls, optional ANSI), for terminal previews and plain-text image substitutes.”

partme-ai

content-media

open

media

276

threejs-audio

three.js audio spatialization: AudioListener attached to camera rig, Audio and PositionalAudio sources, AudioAnalyser for FFT/time-domain data, and integration with Web Audio API contexts; AudioLoader is referenced from threejs-loaders for file decoding. Use when placing 3D sound, configuring panner parameters, or building audio visualization; not a replacement for full game audio middleware.

partme-ai

content-media

open

media

273

video-processor

Download and process videos from YouTube and other platforms. Supports video download, audio extraction, format conversion (mp4, webm), and Whisper transcription. Use when user mentions YouTube download, video conversion, audio extraction, transcription, mp4, webm, ffmpeg, yt-dlp, or whisper transcription.

iamzhihuix

content-media

open

media

273

image-file

Guidelines for handling image files

agentscope-ai

content-media

open

media

273

video-frames

Extract frames or short clips from videos using ffmpeg.

eggent-ai

content-media

open

media

268

speech-to-text

Transcribe audio to text using ElevenLabs Scribe v2. Use when converting audio/video to text, generating subtitles, transcribing meetings, or processing spoken content.

tadaspetra

content-media

open

Page 17 / 62