home/categories/llm-ai/patricio0312rev-skills-ai-engineering-doc-to-vector-dataset-generator-skill-md
llm-aidata-ai

doc-to-vector-dataset-generator

Converts documents into clean, chunked datasets suitable for embeddings and vector search. Produces chunked JSONL files with metadata, deduplication logic, and quality checks. Use when preparing "training data", "vector datasets", "document processing", or "embedding data".

patricio0312rev
maintainer
patricio0312rev
更新於 1/12/2026
星標
6
分支
0
quick start

Installation and usage

Converts documents into clean, chunked datasets suitable for embeddings and vector search. Produces chunked JSONL files with metadata, deduplication logic, and quality checks. Use when preparing "training data", "vector datasets", "document processing", or "embedding data".

安裝
$ install --globalskills.sh
使用

安裝後,您可以透過在終端機執行以下指令來使用此技能:

skills use doc-to-vector-dataset-generator