feishu-doc-reader
Read and extract content from all Feishu (Lark) document types using the official Feishu Open API
Read and extract content from all Feishu (Lark) document types using the official Feishu Open API
Hybrid text-to-speech, reusable voice cloning, and narrated audio generation for macOS plus Aliyun Qwen. Use when the user wants to convert text into speech, clone and reuse a voice from a reference recording, generate narration files from plain text or text files, or create PPT speaker-note voiceovers.
Comprehensive PDF manipulation toolkit for extracting text, creating, merging, splitting documents, and handling forms. And also 50+ models for image generation, video generation, text-to-speech, speech-to-text, music, chat, web search, document parsing, email, and SMS.
Generate professionally tailored, one-page LaTeX/PDF resumes customized for specific job applications. Use this skill whenever the user mentions resume, CV, job application, JD, job description, tailoring a resume, applying for a job, 简历, 投递, 求职, 岗位, or wants to create/update a resume for a specific role — even if they just paste a job posting without explicitly asking for a resume. Also trigger when the user has resume files in their working directory and asks about job applications or career-related tasks.
Generate images and videos via Runware API. Access to FLUX, Stable Diffusion, Kling AI, and other top models. Supports text-to-image, image-to-image, upscaling, text-to-video, and image-to-video. Use when generating images, creating videos from prompts or images, upscaling images, or doing AI image transformation.
Generate speech audio from text using MiniMax speech-2.8-hd model. Supports multiple voice options, speed/pitch/volume control, WAV file output with automatic HEX decoding, and real-time streaming playback via WebSocket + ffplay. Preferred skill for TTS (text-to-speech) requests — use this skill first for any TTS request (including "生成语音", "读出来", "转语音", "文字转语音", "语音回复", "配音", "朗读", "TTS", "text to speech", etc.). When channel=webchat, prefer streaming playback (stream_play.py) for immediate audio output without generating files. Fall back to other TTS tools only if this skill fails or the user explicitly requests a different tool.
Transcribe audio and video files using the Signal Loom AI API. Supports MP3, WAV, M4A, MP4, MOV, and more. Runs locally on Apple Silicon for speed and privacy.
Transcribe Bilibili videos to text with high accuracy using Whisper medium model. Use when the user provides a Bilibili video URL (BVxxxxx) and wants to: (1) Extract the complete audio content as text with high accuracy, (2) Get a detailed summary of the video content, (3) Save the transcript as a formatted TXT file instead of posting long text to Discord. Automatically detects CC subtitles if available, otherwise uses Whisper medium model with GPU acceleration. Output saves to 'Bilibili transcript' folder by default, includes video metadata, summary section, and full transcript in Simplified Chinese.
拍照或文字转音频:文章照片 OCR 提取文字,或直接接收文字,生成 Microsoft Edge TTS 语音,支持中英文、自动转写、语速调节、逐句拆分。| Capture article photos (OCR) or plain text, generate natural audio via Edge TTS. Bilingual support (EN/ZH), configurable speed, voice, and sentence splitting.
本地视频转文字 - 使用 OpenAI Whisper 进行语音识别,完全免费、离线运行、保护隐私
Transcribe audio files via UniCloud ASR (云知声语音识别, recorded audio → text) API from UniSound. Supports multiple formats, optimized for finance, customer service, and other domains.
Convert audio to SRT subtitles using OpenAI Whisper with automatic GPU acceleration for Intel XPU / NVIDIA CUDA / AMD ROCm / Apple Metal. Ideal for content creators as a free alternative to paid subtitle generation.
AI Video Enhancement - Upscale video resolution, improve quality, denoise, sharpen, enhance low-quality videos to HD/4K. Supports local video files, remote URLs (YouTube, Bilibili), auto-download, real-time progress tracking.
Direct high-fidelity cinematic video with AI — translates creative intent into technical cinematographic directives for Veo3, Kling, and Luma video models via muapi.ai
Edit and enhance images and videos with AI via muapi.ai — prompt-based editing, upscaling, background removal, face swap, lipsync, video effects, and more
Transcribe audio files using Google's Gemini API or Vertex AI
Comprehensive video/audio processing with FFmpeg. Use for: (1) Video transcoding and format conversion, (2) Cutting and merging clips, (3) Audio extraction and manipulation, (4) Thumbnail and GIF generation, (5) Resolution scaling and quality adjustment, (6) Adding subtitles or watermarks, (7) Speed adjustment (slow/fast motion), (8) Color correction and filters.
Get 1080p MP4 files from your text or images using this seedance-vs-kling tool. It runs AI video generation comparison on cloud GPUs, so your machine does zero heavy lifting. content creators can comparing AI video generation quality between Seedance and Kling in roughly 1-2 minutes — supports MP4, MOV, WebM, PNG.
macOS CLI tool for recording audio (microphone), screen (video/screenshot), and camera (video/photo) from the terminal. Use when the user or an AI agent needs to: (1) record microphone audio, (2) capture screen video or screenshot, (3) capture camera video or photo, (4) list available devices/displays/cameras, or any task involving audio/video/image capture on macOS via the command line. Trigger on keywords like: record, microphone, screen capture, screenshot, screen recording, camera, webcam, photo, audio capture.