home/categories/documents/zai-org-glm-ocr-skills-sdk-skill-md
documentscontent-media

glmocr

Trigger when: (1) User wants to extract text, tables, formulas, or structured data from images/PDFs/scanned documents, (2) User mentions "OCR", "文字识别", "文档解析", (3) User has a document (screenshot, scanned page, invoice, paper, whiteboard photo) and needs its content in structured form, (4) User asks to parse, digitize, or extract content from a visual document. Invokes the GLM-OCR SDK (pip install glmocr) to parse documents via Zhipu's cloud API. No GPU required. Returns structured JSON (regions with labels + bounding boxes) and Markdown. Agent can operate entirely via CLI — no YAML files needed. NOT for: real-time camera feeds, audio transcription, or non-document images (photos, illustrations).

zai-org
maintainer
zai-org
अपडेट किया गया 3/18/2026
स्टार
5796
फोर्क
530
quick start

Installation and usage

Trigger when: (1) User wants to extract text, tables, formulas, or structured data from images/PDFs/scanned documents, (2) User mentions "OCR", "文字识别", "文档解析", (3) User has a document (screenshot, scanned page, invoice, paper, whiteboard photo) and needs its content in structured form, (4) User asks to parse, digitize, or extract content from a visual document. Invokes the GLM-OCR SDK (pip install glmocr) to parse documents via Zhipu's cloud API. No GPU required. Returns structured JSON (regions with labels + bounding boxes) and Markdown. Agent can operate entirely via CLI — no YAML files needed. NOT for: real-time camera feeds, audio transcription, or non-document images (photos, illustrations).

इंस्टॉलेशन
$ install --globalskills.sh
उपयोग

इंस्टॉल करने के बाद, आप टर्मिनल में यह कमांड चलाकर इस स्किल का उपयोग कर सकते हैं:

skills use glmocr