video-summarization-via-object-tracking
Implements a computer vision pipeline to summarize videos by detecting and tracking multiple objects, selecting only frames containing motion.
Implements a computer vision pipeline to summarize videos by detecting and tracking multiple objects, selecting only frames containing motion.
Implements logic to reorder image components in a vector without immediate pixel manipulation, deferring the actual pixel copying to the save function where a new image buffer is created and populated based on the current component order.
Provides image processing capabilities for objects in Bytedance TOS using the official SDK. Supports getting image info, format conversion, resizing, and watermarking. Use when you need to analyze or transform images stored in TOS.
使用 video_generate.py 脚本生成视频,需要提供文件名和 prompt,可选提供首帧图片(URL或本地路径)。
Uses Volcengine TOS SDK object processing (e.g., `video/info`, `video/snapshot`) to fetch video metadata and extract single or multiple frame snapshots from videos stored in Bytedance TOS. Use when the user needs video info/metadata, thumbnail or frame capture, snapshot extraction, or mentions TOS video processing.
Video content understanding operator (las_vlm_video) via Doubao models. Use this skill when user needs to: - Analyze/describe video content with natural language prompts - Ask questions about what happens in a video (objects, actions, scenes) - Summarize video, extract key events, or generate captions Supports public/intranet-accessible video URLs and returns model responses + compression metadata. Requires LAS_API_KEY for authentication.
Volcengine AI MediaKit audio and video processing skill. It is triggered when users need to process or edit audio/video content. After processing, it automatically checks task status and returns playback links for the generated outputs. Core capabilities are grouped into five categories: 1) Video processing: multi-clip stitching, clip trimming, frame flipping, video speed adjustment, audio speed adjustment, image-to-video generation, audio-video composition, audio track extraction, and audio mixing; 2) Audio processing: vocal/accompaniment separation and audio noise reduction; 3) Video enhancement: comprehensive quality restoration, AI super-resolution, and intelligent frame interpolation; 4) AI content analysis: ASR speech-to-text, OCR text extraction, subtitle removal, subtitle embedding, intelligent scene slicing, portrait matting, green screen matting, media info query, and highlight extraction; 5) AI content generation: comic style transfer, AI video translation, AI drama recap narration, and AI drama sc
Video resolution resize operator (las_video_resize). Use this skill when user needs to: - Resize video resolution into a target range (min/max width/height) - Preserve aspect ratio with increase/decrease/disable strategies - Control encoding quality options for GPU NVENC (cq/rc) Supports input from public URL/intranet URL/TOS and outputs to TOS. If user provides local video files or requires local outputs, use byted-tosfile-access to upload/download as a TOS bridge. Requires LAS_API_KEY for authentication.
Audio extract and split operator. Use this skill when user needs to: - Extract audio from video files (mp4, wmv, etc.) - Split audio into segments of specific duration - Convert audio format (wav, mp3, flac) Supports input from TOS and output to TOS. Requires LAS_API_KEY for authentication.
Image resampling operator for downsampling images. Use this skill when user needs to: - Resize/downsample images to target size - Change image DPI settings - Convert between JPG/PNG formats Supports 4 interpolation methods: nearest, bilinear, bicubic, lanczos. Supports input from URL, TOS, base64, or binary. Requires LAS_API_KEY for authentication.
视频分镜拆解技能(自包含,无需外部后端)。使用 FFmpeg 预处理视频并生成分镜数据。(1) 视频 URL 直接处理 `python scripts/process_video.py "<video_url>"`;(2) 本地文件先上传 `python scripts/video_upload.py "<file_path>"` 获取 URL 后再处理。需要本机安装 FFmpeg。
Audio format conversion operator. Use this skill when user needs to: - Convert audio files between formats (wav, mp3, flac) - Change audio properties (sample rate, bit rate) using ffmpeg params Supports input from TOS and output to TOS. Requires LAS_API_KEY for authentication.
Extract video clips from long videos based on natural-language descriptions. Use this skill when user needs to: - Extract highlights or specific scenes from videos - Find specific people/objects in videos using reference images - Split long videos into meaningful clips - Generate video summaries with timestamps Supports reference images for target identification, outputs TOS clip URLs. Requires LAS_API_KEY for authentication.
Video inpainting operator (las_video_inpaint) for removing watermarks/subtitles/logos from videos. Use this skill when user needs to: - Remove watermarks, subtitles, or scrolling subtitles from a video - Repair a video by inpainting fixed regions (fixed_bboxes) or auto-detected regions - Run video restoration and get the output TOS path + optional subtitle bbox Supports input from public URL/intranet URL/TOS and outputs to TOS. If user provides local video files or requires local outputs, use byted-tosfile-access to upload/download as a TOS bridge. Requires LAS_API_KEY for authentication.
Capture screenshots of Unity Editor windows as PNG files. Use when you need to: (1) Screenshot Game View, Scene View, Console, Inspector, or other windows, (2) Capture current visual state for debugging or documentation, (3) Save editor window appearance as image files.
Capture screenshots of Unity Editor windows as PNG files. Use when you need to: (1) Screenshot Game View, Scene View, Console, Inspector, or other windows, (2) Capture current visual state for debugging or documentation, (3) Save editor window appearance as image files.
“Convert an image into ASCII art (readable + detail variants, width/charset controls, optional ANSI), for terminal previews and plain-text image substitutes.”
three.js audio spatialization: AudioListener attached to camera rig, Audio and PositionalAudio sources, AudioAnalyser for FFT/time-domain data, and integration with Web Audio API contexts; AudioLoader is referenced from threejs-loaders for file decoding. Use when placing 3D sound, configuring panner parameters, or building audio visualization; not a replacement for full game audio middleware.
Download and process videos from YouTube and other platforms. Supports video download, audio extraction, format conversion (mp4, webm), and Whisper transcription. Use when user mentions YouTube download, video conversion, audio extraction, transcription, mp4, webm, ffmpeg, yt-dlp, or whisper transcription.
Extract frames or short clips from videos using ffmpeg.
Transcribe audio to text using ElevenLabs Scribe v2. Use when converting audio/video to text, generating subtitles, transcribing meetings, or processing spoken content.