Media

media

4K

tinycompress

Compress PNG, JPEG, WebP images using TinyPNG/Tinify free web API. No API key required, no login needed. Supports single/batch/directory compression with automatic retry and rate-limit handling.

openclaw

content-media

open

media

4K

audio-to-text-and-video-to-text

Transcribe audio and video files into text using OpenAI's Whisper API. Use this skill whenever a user wants to convert any audio or video file to text — including MP3, MP4, WAV, M4A, OGG, WEBM, MOV, AVI, FLAC, and more. Trigger this skill for any request involving: "transcribe", "convert audio to text", "speech to text", "get transcript of", "extract audio from video", "meeting notes from recording", "subtitles", "captions", or similar. Also trigger when the user uploads or references a media file and asks what was said, discussed, or mentioned in it. If unsure whether audio/video transcription is involved, use this skill.

openclaw

content-media

open

media

4K

audio-to-text-and-video-to-text

Transcribe audio and video files into text using OpenAI's Whisper API. Use this skill whenever a user wants to convert any audio or video file to text — including MP3, MP4, WAV, M4A, OGG, WEBM, MOV, AVI, FLAC, and more. Trigger this skill for any request involving: "transcribe", "convert audio to text", "speech to text", "get transcript of", "extract audio from video", "meeting notes from recording", "subtitles", "captions", or similar. Also trigger when the user uploads or references a media file and asks what was said, discussed, or mentioned in it. If unsure whether audio/video transcription is involved, use this skill.

openclaw

content-media

open

media

4K

video-transcriber

(已验证) 强大的抖音视频批量转写器，集成了下载、音频提取和本地 Whisper 模型转写功能。

openclaw

content-media

open

media

4K

universal-video-to-s3-uploader

Download videos from YouTube, Twitter/X, TikTok, Douyin, Bilibili and upload to S3-compatible storage. Universal video downloader with smart quality selection and audio merging.

openclaw

content-media

open

media

4K

youtube-music-cast

Download music from YouTube/YouTube Music and stream to Chromecast via Home Assistant. Complete CLI toolset with web server integration, configuration wizard, and playback controls.

openclaw

content-media

open

media

4K

aimlapi-voice

Transcribe audio files (ogg, mp3, wav, etc.) using AIMLAPI. Use when the user provides audio messages or local audio files. Provides a reliable Python script with retries and polling.

openclaw

content-media

open

media

4K

dpi-upscaler-checker

Check image DPI and intelligently upscale low-resolution images using super-resolution

openclaw

content-media

open

media

4K

gemini-video-analyzer

Native video analysis using Google Gemini API. Upload and analyze video files — describe scenes, extract text/UI, answer questions about content, transcribe speech, identify objects and actions. Use when: (1) User sends a video file and wants it analyzed, (2) Video summarization or description needed, (3) Extracting text, UI elements, or information from screen recordings, (4) Answering questions about video content, (5) Comparing multiple videos, (6) Analyzing tutorials, demos, or walkthroughs.

openclaw

content-media

open

media

4K

gemini-video-analyzer

Native video analysis using Google Gemini API. Upload and analyze video files — describe scenes, extract text/UI, answer questions about content, transcribe speech, identify objects and actions. Use when: (1) User sends a video file and wants it analyzed, (2) Video summarization or description needed, (3) Extracting text, UI elements, or information from screen recordings, (4) Answering questions about video content, (5) Comparing multiple videos, (6) Analyzing tutorials, demos, or walkthroughs.

openclaw

content-media

open

media

4K

qwen-asr

Transcribe audio files using Qwen ASR (千问STT). Use when the user sends voice messages and wants them converted to text.

openclaw

content-media

open

media

4K

deapi

AI media generation via deAPI. Transcribe YouTube/audio/video, generate images from text, text-to-speech, OCR, remove backgrounds, upscale images, create videos, generate embeddings. 10-20x cheaper than OpenAI/Replicate.

openclaw

content-media

open

media

4K

deapi-audio

Text-to-speech, voice cloning, voice design, and transcribe audio files via deAPI GPU network. Trigger on 'text to speech', 'TTS', 'generate voice', 'read aloud', 'voice clone', 'clone voice', 'voice design', 'design voice', 'custom voice', 'transcribe audio', 'STT'. For video/YouTube transcription use deapi-video instead.

openclaw

content-media

open

media

4K

audio-mastering-cli

CLI audio mastering without a reference track using ffmpeg; accepts audio or video inputs and outputs mastered WAV/MP3 or remuxed MP4.

openclaw

content-media

open

media

3.7K

morph-apply

Fast file editing via Morph Apply API (10,500 tokens/sec, 98% accuracy)

parcadei

content-media

open

media

3.6K

gemini-watermark-remover

Remove visible Gemini image watermarks from local image files by calling the project's CLI. Use when the user wants an agent to clean one or more local Gemini-generated images and save de-watermarked output files.

GargantuaX

content-media

open

media

3.4K

evaluate-video-quality

Evaluate generated video quality using available metrics (SSIM, loss trajectory, caption consistency)

hao-ai-lab

content-media

open

media

3.2K

evaluate-multimodal

Evaluate multimodal AI agents that process images, audio, PDFs, or other files. Sets up evaluations using LangWatch's LLM-as-judge with image inputs, Scenario's multimodal testing, and document parsing evaluation patterns. Use when your agent handles non-text inputs.

langwatch

content-media

open

media

3.1K

natural-language-video-search

Semantic search over video files using Gemini embeddings. Index dashcam, security camera, or any mp4 footage, then search with natural language queries to find and auto-trim matching clips.

ssrajadh

content-media

open

media

3K

muapi-media-editing

Edit and enhance images and videos with AI via muapi.ai — prompt-based editing, upscaling, background removal, face swap, lipsync, video effects, and more

SamurAIGPT

content-media

open

media

3K

muapi-cinema-director

Direct high-fidelity cinematic video with AI — translates creative intent into technical cinematographic directives for Veo3, Kling, and Luma video models via muapi.ai

SamurAIGPT

content-media

open

media

2.7K

youtube-downloader

Download YouTube videos with customizable quality and format options. Use this skill when the user asks to download, save, or grab YouTube videos. Supports various quality settings (best, 1080p, 720p, 480p, 360p), multiple formats (mp4, webm, mkv), and audio-only downloads as MP3.

davepoon

content-media

open

media

2.7K

image-enhancer

Improves the quality of images, especially screenshots, by enhancing resolution, sharpness, and clarity. Perfect for preparing images for presentations, documentation, or social media posts.

davepoon

content-media

open

media

2.7K

transcribe

Process audio recordings, meeting transcripts, podcasts, or lectures. Runs an intake interview (date, mode, speakers, language) then processes into structured notes with action items, decisions, and glossary. Triggers: EN: "transcribe", "I have a recording", "process this audio", "meeting notes from recording", "summarize the call", "lecture notes", "podcast summary". IT: "trascrivi", "ho una registrazione", "processa questo audio", "note della riunione", "riassumi la call". FR: "transcrire", "j'ai un enregistrement", "résumer l'appel". ES: "transcribir", "tengo una grabación", "resumir la llamada". DE: "transkribieren", "Aufnahme verarbeiten". PT: "transcrever", "tenho uma gravação".

gnekt

content-media

open