home/categories/data-engineering

category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 스킬all categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-engineering

156

rnaseq-deseq2

ToolUniverse workflow — Rnaseq Deseq2

lamm-mit

data-ai

open

data-engineering

156

Use this skill for processing and analyzing large tabular datasets (billions of rows) that exceed available RAM. Vaex excels at out-of-core DataFrame operations, lazy evaluation, fast aggregations, efficient visualization of big data, and machine learning on large datasets. Apply when users need to work with large CSV/HDF5/Arrow/Parquet files, perform fast statistics on massive datasets, create visualizations of big data, or build ML pipelines that do not fit in memory.

lamm-mit

data-ai

open

data-engineering

156

zarr-python

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

lamm-mit

data-ai

open

data-engineering

155

digikey

Search DigiKey for electronic components and download datasheets — primary source for prototype orders and the preferred API method for fetching datasheets. Find parts by keyword or MPN, check pricing/stock, download datasheets via API, analyze specifications. Sync and maintain a local datasheets directory — extract components from schematics, download missing datasheets, keep them up to date. Use when the user asks about electronic components, part specs, datasheets, pricing, stock, footprints, or needs to download a datasheet — even without mentioning "DigiKey". Also for "sync datasheets", "download datasheets for my board/project", or mentions a datasheets directory. DigiKey is the default distributor for prototyping. For BOM workflows, see the bom skill.

aklofas

data-ai

open

data-engineering

155

task-decomposer

Produces structured phased task boards from feature requests: dependency-mapped work items with parallelization flags, risk flags, edge case tables, and test strategy matrices. Triggers on: "decompose this feature", "task breakdown with dependencies", "phased implementation plan", "dependency map for", "break this into tasks with phases", "work breakdown structure". The differentiator is the structured output format (phased tables, parallelization flags, dependency chains) — use this skill when you need a formal task board, not ad-hoc decomposition the model handles natively. NOT for effort estimates, PERT calculations, or confidence intervals — use estimate-calibrator instead.

Mathews-Tom

data-ai

open

data-engineering

154

distill-pr-intent-orchestrator

Run PR intent distillation across a PR set and persist outputs + sidecar metadata to /memory/pr-intent.

openclaw

data-ai

open

data-engineering

152

repo-article

Write an article about the current state, progress, and vision of the watched repo

aaronjmars

data-ai

open

data-engineering

143

firmographic-analysis

Use when interpreting company-level enrichment data to segment accounts, spot buying triggers, and tailor outreach.

gtmagents

data-ai

open

data-engineering

143

update-dataset

End-to-end dataset update workflow with PR creation, snapshot, meadow, garden, and grapher steps. Use when user wants to update a dataset, refresh data, run ETL update, or mentions updating dataset versions.

owid

data-ai

open

data-engineering

143

data-contract-framework

Operating model for defining, enforcing, and auditing BI data contracts.

gtmagents

data-ai

open

data-engineering

143

community-insight-taxonomy

Tagging schema for classifying community signals by persona, journey, and business impact.

gtmagents

data-ai

open

data-engineering

143

signal-taxonomy

Use to define schemas, topic tags, and lineage metadata for enriched signals.

gtmagents

data-ai

open

data-engineering

143

deal-review

Use to run structured opportunity inspections that align pipeline data with buyer reality.

gtmagents

data-ai

open

data-engineering

143

comp-mechanics

Use to assemble rate tables, accelerator logic, and plan governance templates.

gtmagents

data-ai

open

data-engineering

143

golden-dataset

Golden dataset lifecycle patterns for curation, versioning, quality validation, and CI integration. Use when building evaluation datasets, managing dataset versions, validating quality scores, or integrating golden tests into pipelines.

yonatangross

data-ai

open

data-engineering

143

ct-epic-architect

Epic planning and task decomposition for breaking down large initiatives into atomic, executable tasks. Provides dependency analysis, wave-based parallel execution planning, hierarchy management, and research linking. Use when creating epics, decomposing initiatives into task trees, planning parallel workflows, or analyzing task dependencies. Triggers on epic creation, task decomposition requests, or planning phase work.

kryptobaseddev

data-ai

open

data-engineering

143

migrate-dataset

Migrate a legacy OWID dataset (no catalogPath) into the ETL pipeline. Use when user wants to migrate, backport, or convert a legacy dataset by ID, or mentions datasets without catalogPath.

owid

data-ai

open

data-engineering

143

create-multidim

Create multi-dimensional (multidim/MDIM) chart configurations in the OWID ETL pipeline. Use this skill when the user wants to create a new multidim, build a multi-dimensional chart, combine multiple charts into one with dimension toggles, or mentions 'multidim' or 'MDIM'.

owid

data-ai

open

data-engineering

143

ct-orchestrator

Pipeline-aware orchestration skill for managing complex workflows through subagent delegation. Use when the user asks to "orchestrate", "orchestrator mode", "run as orchestrator", "delegate to subagents", "coordinate agents", "spawn subagents", "multi-agent workflow", "context-protected workflow", "agent farm", "HITL orchestration", "pipeline management", or needs to manage complex workflows by delegating work to subagents while protecting the main context window. Enforces ORC-001 through ORC-009 constraints. Provider-neutral — works with any AI agent runtime.

kryptobaseddev

data-ai

open

data-engineering

143

ct-adr-recorder

Records Architecture Decision Records from accepted consensus verdicts. Use when promoting a consensus outcome to a formal ADR: drafts the document in the proposed-then-accepted HITL lifecycle, links to the originating consensus manifest, persists the decision to the canonical SQLite decisions table, and triggers downstream invalidation when an accepted ADR is later superseded. Triggers on phrases like 'write ADR', 'record architecture decision', 'formalize this decision', 'lock in the choice', 'create ADR-XXX', or when a consensus task reaches completed status and needs formalization.

kryptobaseddev

data-ai

open

data-engineering

142

kql-query-authoring

Use this skill when asked to write, create, or help with KQL (Kusto Query Language) queries for Microsoft Sentinel, Defender XDR, or Azure Data Explorer. Triggers on keywords like "write KQL", "create KQL query", "help with KQL", "query [table]", "KQL for [scenario]", or when a user requests queries for specific data analysis scenarios. This skill uses schema validation, Microsoft Learn documentation, and community examples to generate production-ready KQL queries.