category focus

Media

Audio, video, and image processing.

1476 स्किल्सall categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
media
4K

tinycompress

Compress PNG, JPEG, WebP images using TinyPNG/Tinify free web API. No API key required, no login needed. Supports single/batch/directory compression with automatic retry and rate-limit handling.

openclaw
openclaw
content-media
open
media
4K

audio-to-text-and-video-to-text

Transcribe audio and video files into text using OpenAI's Whisper API. Use this skill whenever a user wants to convert any audio or video file to text — including MP3, MP4, WAV, M4A, OGG, WEBM, MOV, AVI, FLAC, and more. Trigger this skill for any request involving: "transcribe", "convert audio to text", "speech to text", "get transcript of", "extract audio from video", "meeting notes from recording", "subtitles", "captions", or similar. Also trigger when the user uploads or references a media file and asks what was said, discussed, or mentioned in it. If unsure whether audio/video transcription is involved, use this skill.

openclaw
openclaw
content-media
open
media
4K

audio-to-text-and-video-to-text

Transcribe audio and video files into text using OpenAI's Whisper API. Use this skill whenever a user wants to convert any audio or video file to text — including MP3, MP4, WAV, M4A, OGG, WEBM, MOV, AVI, FLAC, and more. Trigger this skill for any request involving: "transcribe", "convert audio to text", "speech to text", "get transcript of", "extract audio from video", "meeting notes from recording", "subtitles", "captions", or similar. Also trigger when the user uploads or references a media file and asks what was said, discussed, or mentioned in it. If unsure whether audio/video transcription is involved, use this skill.

openclaw
openclaw
content-media
open
media
4K

video-transcriber

(已验证) 强大的抖音视频批量转写器,集成了下载、音频提取和本地 Whisper 模型转写功能。

openclaw
openclaw
content-media
open
media
4K

universal-video-to-s3-uploader

Download videos from YouTube, Twitter/X, TikTok, Douyin, Bilibili and upload to S3-compatible storage. Universal video downloader with smart quality selection and audio merging.

openclaw
openclaw
content-media
open
media
4K

youtube-music-cast

Download music from YouTube/YouTube Music and stream to Chromecast via Home Assistant. Complete CLI toolset with web server integration, configuration wizard, and playback controls.

openclaw
openclaw
content-media
open
media
4K

aimlapi-voice

Transcribe audio files (ogg, mp3, wav, etc.) using AIMLAPI. Use when the user provides audio messages or local audio files. Provides a reliable Python script with retries and polling.

openclaw
openclaw
content-media
open
media
4K

dpi-upscaler-checker

Check image DPI and intelligently upscale low-resolution images using super-resolution

openclaw
openclaw
content-media
open
media
4K

gemini-video-analyzer

Native video analysis using Google Gemini API. Upload and analyze video files — describe scenes, extract text/UI, answer questions about content, transcribe speech, identify objects and actions. Use when: (1) User sends a video file and wants it analyzed, (2) Video summarization or description needed, (3) Extracting text, UI elements, or information from screen recordings, (4) Answering questions about video content, (5) Comparing multiple videos, (6) Analyzing tutorials, demos, or walkthroughs.

openclaw
openclaw
content-media
open
media
4K

gemini-video-analyzer

Native video analysis using Google Gemini API. Upload and analyze video files — describe scenes, extract text/UI, answer questions about content, transcribe speech, identify objects and actions. Use when: (1) User sends a video file and wants it analyzed, (2) Video summarization or description needed, (3) Extracting text, UI elements, or information from screen recordings, (4) Answering questions about video content, (5) Comparing multiple videos, (6) Analyzing tutorials, demos, or walkthroughs.

openclaw
openclaw
content-media
open
media
4K

qwen-asr

Transcribe audio files using Qwen ASR (千问STT). Use when the user sends voice messages and wants them converted to text.

openclaw
openclaw
content-media
open
media
4K

deapi

AI media generation via deAPI. Transcribe YouTube/audio/video, generate images from text, text-to-speech, OCR, remove backgrounds, upscale images, create videos, generate embeddings. 10-20x cheaper than OpenAI/Replicate.

openclaw
openclaw
content-media
open
media
4K

deapi-audio

Text-to-speech, voice cloning, voice design, and transcribe audio files via deAPI GPU network. Trigger on 'text to speech', 'TTS', 'generate voice', 'read aloud', 'voice clone', 'clone voice', 'voice design', 'design voice', 'custom voice', 'transcribe audio', 'STT'. For video/YouTube transcription use deapi-video instead.

openclaw
openclaw
content-media
open
media
4K

audio-mastering-cli

CLI audio mastering without a reference track using ffmpeg; accepts audio or video inputs and outputs mastered WAV/MP3 or remuxed MP4.

openclaw
openclaw
content-media
open
media
3.7K

morph-apply

Fast file editing via Morph Apply API (10,500 tokens/sec, 98% accuracy)

parcadei
parcadei
content-media
open
media
3.6K

gemini-watermark-remover

Remove visible Gemini image watermarks from local image files by calling the project's CLI. Use when the user wants an agent to clean one or more local Gemini-generated images and save de-watermarked output files.

GargantuaX
GargantuaX
content-media
open
media
3.4K

evaluate-video-quality

Evaluate generated video quality using available metrics (SSIM, loss trajectory, caption consistency)

hao-ai-lab
hao-ai-lab
content-media
open
media
3.2K

evaluate-multimodal

Evaluate multimodal AI agents that process images, audio, PDFs, or other files. Sets up evaluations using LangWatch's LLM-as-judge with image inputs, Scenario's multimodal testing, and document parsing evaluation patterns. Use when your agent handles non-text inputs.

langwatch
langwatch
content-media
open
media
3.1K

natural-language-video-search

Semantic search over video files using Gemini embeddings. Index dashcam, security camera, or any mp4 footage, then search with natural language queries to find and auto-trim matching clips.

ssrajadh
ssrajadh
content-media
open
media
3K

muapi-media-editing

Edit and enhance images and videos with AI via muapi.ai — prompt-based editing, upscaling, background removal, face swap, lipsync, video effects, and more

SamurAIGPT
SamurAIGPT
content-media
open
media
3K

muapi-cinema-director

Direct high-fidelity cinematic video with AI — translates creative intent into technical cinematographic directives for Veo3, Kling, and Luma video models via muapi.ai

SamurAIGPT
SamurAIGPT
content-media
open
media
2.7K

youtube-downloader

Download YouTube videos with customizable quality and format options. Use this skill when the user asks to download, save, or grab YouTube videos. Supports various quality settings (best, 1080p, 720p, 480p, 360p), multiple formats (mp4, webm, mkv), and audio-only downloads as MP3.

davepoon
davepoon
content-media
open
media
2.7K

image-enhancer

Improves the quality of images, especially screenshots, by enhancing resolution, sharpness, and clarity. Perfect for preparing images for presentations, documentation, or social media posts.

davepoon
davepoon
content-media
open
media
2.7K

transcribe

Process audio recordings, meeting transcripts, podcasts, or lectures. Runs an intake interview (date, mode, speakers, language) then processes into structured notes with action items, decisions, and glossary. Triggers: EN: "transcribe", "I have a recording", "process this audio", "meeting notes from recording", "summarize the call", "lecture notes", "podcast summary". IT: "trascrivi", "ho una registrazione", "processa questo audio", "note della riunione", "riassumi la call". FR: "transcrire", "j'ai un enregistrement", "résumer l'appel". ES: "transcribir", "tengo una grabación", "resumir la llamada". DE: "transkribieren", "Aufnahme verarbeiten". PT: "transcrever", "tenho uma gravação".

gnekt
gnekt
content-media
open
Previous
Page 7 / 62
Next