home/categories/data-engineering/hamelsmu-evals-skills-skills-generate-synthetic-data-skill-md
data-engineeringdata-ai

generate-synthetic-data

Create diverse synthetic test inputs for LLM pipeline evaluation using dimension-based tuple generation. Use when bootstrapping an eval dataset, when real user data is sparse, or when stress-testing specific failure hypotheses. Do NOT use when you already have 100+ representative real traces (use stratified sampling instead), or when the task is collecting production logs.

hamelsmu
maintainer
hamelsmu
更新于 3/3/2026
星标
1105
分支
117
quick start

Installation and usage

Create diverse synthetic test inputs for LLM pipeline evaluation using dimension-based tuple generation. Use when bootstrapping an eval dataset, when real user data is sparse, or when stress-testing specific failure hypotheses. Do NOT use when you already have 100+ representative real traces (use stratified sampling instead), or when the task is collecting production logs.

安装
$ install --globalskills.sh
使用

安装后,您可以通过在终端运行以下命令来使用此技能:

skills use generate-synthetic-data