home/categories/media/hsliuustc0106-vllm-omni-skills-skills-vllm-omni-multimodal-skill-md
mediacontent-media

vllm-omni-multimodal

Transcribe speech, generate images from prompts, analyze video content, and convert between modalities using multimodal omni-modality models like Qwen2.5-Omni and Qwen3-Omni. Use when working with multimodal models for speech recognition, image generation, video understanding, voice synthesis, or any task combining text, image, audio, and video inputs and outputs simultaneously.

hsliuustc0106
maintainer
hsliuustc0106
اپ ڈیٹ ہوا 4/3/2026
اسٹارز
50
فورکس
17
quick start

Installation and usage

Transcribe speech, generate images from prompts, analyze video content, and convert between modalities using multimodal omni-modality models like Qwen2.5-Omni and Qwen3-Omni. Use when working with multimodal models for speech recognition, image generation, video understanding, voice synthesis, or any task combining text, image, audio, and video inputs and outputs simultaneously.

انسٹالیشن
$ install --globalskills.sh
استعمال

انسٹال کرنے کے بعد، آپ یہ اسکل ٹرمینل میں درج ذیل کمانڈ چلا کر استعمال کر سکتے ہیں:

skills use vllm-omni-multimodal