home/categories/documents/zai-org-glm-ocr-skills-sdk-skill-md
documentscontent-media

glmocr

Trigger when: (1) User wants to extract text, tables, formulas, or structured data from images/PDFs/scanned documents, (2) User mentions "OCR", "文字识别", "文档解析", (3) User has a document (screenshot, scanned page, invoice, paper, whiteboard photo) and needs its content in structured form, (4) User asks to parse, digitize, or extract content from a visual document. Invokes the GLM-OCR SDK (pip install glmocr) to parse documents via Zhipu's cloud API. No GPU required. Returns structured JSON (regions with labels + bounding boxes) and Markdown. Agent can operate entirely via CLI — no YAML files needed. NOT for: real-time camera feeds, audio transcription, or non-document images (photos, illustrations).

zai-org
maintainer
zai-org
更新於 3/18/2026
星標
5796
分支
530
quick start

Installation and usage

Trigger when: (1) User wants to extract text, tables, formulas, or structured data from images/PDFs/scanned documents, (2) User mentions "OCR", "文字识别", "文档解析", (3) User has a document (screenshot, scanned page, invoice, paper, whiteboard photo) and needs its content in structured form, (4) User asks to parse, digitize, or extract content from a visual document. Invokes the GLM-OCR SDK (pip install glmocr) to parse documents via Zhipu's cloud API. No GPU required. Returns structured JSON (regions with labels + bounding boxes) and Markdown. Agent can operate entirely via CLI — no YAML files needed. NOT for: real-time camera feeds, audio transcription, or non-document images (photos, illustrations).

安裝
$ install --globalskills.sh
使用

安裝後,您可以透過在終端機執行以下指令來使用此技能:

skills use glmocr