qqbot-media
QQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
QQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
Extract subtitles/transcripts from YouTube videos. Triggers: "youtube transcript", "extract subtitles", "video captions", "视频字幕", "字幕提取", "YouTube转文字", "提取字幕".
Process video files with audio extraction, format conversion (mp4, webm), and Whisper transcription. Use when user mentions video conversion, audio extraction, transcription, mp4, webm, ffmpeg, or whisper transcription.
Use when processing voice transcripts, brain dumps, stream-of-consciousness notes, or any raw multi-topic capture. Extracts every idea thread, then evaluates each one with deep brainstorming, then captures results to Open Brain. Trigger on transcripts, exports, "process this", "pan for gold", "brain dump", "what did I say", or multi-topic markdown files.
Use when the user asks to create or edit videos end-to-end (script→video, auto-cut/jumpcut, captions/subtitles, polishing for Shorts/Reels/TikTok). Current implemented backend: local FFmpeg (probe/render/jumpcut/burn-subtitles/polish). Planned/optional backends: Remotion (motion graphics templates), VectCutAPI (CapCut/剪映 timeline editing), and video-audio-mcp (MCP tool wrapper) when available. Produces a finished video artifact (MP4 by default) from assets + copy + a design/storyboard plan.
スクリーンショットの取得・加工を行う。ローカルスクショから最新画像を取得、 PlaywrightでWebページをキャプチャ、画像のトリミング・リサイズ、機微情報を黒塗りマスキング。 記事執筆、レポート作成、UI確認、画像加工時に起動。 「スクショ」「スクリーンショット」「画面キャプチャ」「最新のスクショ」「画像加工」「トリミング」「マスク」「写メ」「写メ撮った」「スクショ撮った」で起動。 Do NOT use for: 画像生成(shogun-imagegenを使え)。
"An intelligent assistant specialized in handling media files (images/audio/video). **Only for media file analysis**, does not handle document types.\n\n✅ Media files that can be processed:\n- Images: .jpg, .jpeg, .png, .gif, .bmp, .webp, .svg\n- Audio: .mp3, .wav, .m4a, .flac, .aac, .ogg\n- Video: .mp4, .avi, .mov, .mkv, .webm, .flv\n\n❌ Files that cannot be processed (please do not trigger this skill):\n- Documents: .pdf, .doc, .docx, .txt, .md, .rtf\n- Spreadsheets: .xlsx, .xls, .csv, .tsv\n- Presentations: .pptx, .ppt, .key\n- Code: .py, .js, .ts, .java, .cpp, .go, .rs\n- Archives: .zip, .tar, .gz, .rar, .7z\n- Executables: .exe, .bin, .app, .dmg\n- Databases: .db, .sqlite, .sql\n- Configuration files: .json, .xml, .yaml, .yml, .toml, .ini\n- Web pages: .html, .htm, .css\n\n**Trigger conditions**: When the user explicitly requests to analyze image/audio/video content, or when the file extension belongs to the aforementioned media types.". "
Burn hard subtitles from UTF-8 SRT files using moviepy 2.x with CJK-capable system fonts; tune font size, placement, stroke, and encode settings (bitrate or CRF) to avoid oversized outputs. Documents ffprobe/ffmpeg workflows for inspection, encoding, and batch jobs; troubleshooting for fonts, bitrate, and pacing. Covers voiceover with edge-tts (voice selection, rate/volume/pitch), matching narration length to video with atempo/apad, and multi-scene pacing with breathing room. Targets moviepy 2.x and Python 3.x on macOS, Linux, and Windows.
Prevent stutter and frozen frames when embedding a child video inside a parent in code-driven pipelines (Remotion, After Effects scripting, FFmpeg filter graphs). Explains why sparse keyframes break frame-accurate seek during per-frame export, and how re-encoding with H.264 all-intra GOP (-g 1) and yuv420p makes every frame independently decodable. Includes FFmpeg command, parameter notes, file-size tradeoffs, and a reusable rule for any seek-heavy programmatic video workflow.
HTML 转图片 skill - 将 HTML 文件或内容通过 agent-browser 渲染并截图为图片。适用于生成信息图、社交媒体配图、数据可视化截图等场景。
Production pipeline for ASCII art video — any format. Converts video/audio/images/generative input into colored ASCII character video output (MP4, GIF, image sequence). Covers: video-to-ASCII conversion, audio-reactive music visualizers, generative ASCII art animations, hybrid video+audio reactive, text/lyrics overlays, real-time terminal rendering. Use when users request: ASCII video, text art video, terminal-style video, character art animation, retro text visualization, audio visualizer in ASCII, converting video to ASCII art, matrix-style effects, or any animated ASCII output.
Extracts frames and timestamped audio segments from video files (GIF, MP4, MOV) at configurable intervals and stores them in a directory with a manifest file. Use when analyzing video content, preparing frames for visual review, extracting audio for transcription, or creating frame+audio sequences for another agent to process.
Transcribes audio/video files using ElevenLabs Scribe v2 API. Use when transcribing audio files, generating transcripts, or converting speech to text.
Guide for using typed-ffmpeg, a modern Python FFmpeg wrapper with extensive typing support and comprehensive filter support. Use this when working with FFmpeg operations, video/audio processing, or filter graphs in Python.
Analyze local images using vision-capable LLM. Use when the user question depends on visual content from a local image file — visual question answering, describing images, reading text in images, identifying objects, etc.
Download videos and audio from 1500+ websites including YouTube, Bilibili, TikTok, Twitter/X, Instagram, Vimeo, and more using yt-dlp. Use when the user wants to download videos, save media from social platforms, download with specific resolution (720p/1080p/4K), get subtitles, or download entire playlists. Triggers on requests like 'download this video', 'save this YouTube video', 'download in 1080p', 'download with subtitles', 'download this playlist'.
Extract audio from videos and download audio-only content from 1500+ websites using yt-dlp. Converts to MP3, M4A, FLAC, WAV, or OPUS with embedded metadata and cover art. Use when the user wants to extract audio from videos, download podcasts, download music from YouTube/SoundCloud/Bandcamp, convert video to audio, or batch download playlist audio. Triggers on requests like 'extract audio', 'download as MP3', 'get the audio from this video', 'download this podcast', 'download music', 'convert to FLAC'.
Create professional videos autonomously using claude-code-video-toolkit — AI voiceovers, image generation, music, talking heads, and Remotion rendering.
AI video generation with LTX-2.3 22B — text-to-video, image-to-video clips for video production. Use when generating video clips, animating images, creating b-roll, animated backgrounds, or motion content. Triggers include video generation, animate image, b-roll, motion, video clip, text-to-video, image-to-video.