home/categories/data-engineering
category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 스킬all categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
data-engineering
156

rnaseq-deseq2

ToolUniverse workflow — Rnaseq Deseq2

lamm-mit
lamm-mit
data-ai
open
data-engineering
156

vaex

Use this skill for processing and analyzing large tabular datasets (billions of rows) that exceed available RAM. Vaex excels at out-of-core DataFrame operations, lazy evaluation, fast aggregations, efficient visualization of big data, and machine learning on large datasets. Apply when users need to work with large CSV/HDF5/Arrow/Parquet files, perform fast statistics on massive datasets, create visualizations of big data, or build ML pipelines that do not fit in memory.

lamm-mit
lamm-mit
data-ai
open
data-engineering
156

zarr-python

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

lamm-mit
lamm-mit
data-ai
open
data-engineering
155

digikey

Search DigiKey for electronic components and download datasheets — primary source for prototype orders and the preferred API method for fetching datasheets. Find parts by keyword or MPN, check pricing/stock, download datasheets via API, analyze specifications. Sync and maintain a local datasheets directory — extract components from schematics, download missing datasheets, keep them up to date. Use when the user asks about electronic components, part specs, datasheets, pricing, stock, footprints, or needs to download a datasheet — even without mentioning "DigiKey". Also for "sync datasheets", "download datasheets for my board/project", or mentions a datasheets directory. DigiKey is the default distributor for prototyping. For BOM workflows, see the bom skill.

aklofas
aklofas
data-ai
open
data-engineering
155

task-decomposer

Produces structured phased task boards from feature requests: dependency-mapped work items with parallelization flags, risk flags, edge case tables, and test strategy matrices. Triggers on: "decompose this feature", "task breakdown with dependencies", "phased implementation plan", "dependency map for", "break this into tasks with phases", "work breakdown structure". The differentiator is the structured output format (phased tables, parallelization flags, dependency chains) — use this skill when you need a formal task board, not ad-hoc decomposition the model handles natively. NOT for effort estimates, PERT calculations, or confidence intervals — use estimate-calibrator instead.

Mathews-Tom
Mathews-Tom
data-ai
open
data-engineering
154

distill-pr-intent-orchestrator

Run PR intent distillation across a PR set and persist outputs + sidecar metadata to /memory/pr-intent.

openclaw
openclaw
data-ai
open
data-engineering
152

repo-article

Write an article about the current state, progress, and vision of the watched repo

aaronjmars
aaronjmars
data-ai
open
data-engineering
143

firmographic-analysis

Use when interpreting company-level enrichment data to segment accounts, spot buying triggers, and tailor outreach.

gtmagents
gtmagents
data-ai
open
data-engineering
143

update-dataset

End-to-end dataset update workflow with PR creation, snapshot, meadow, garden, and grapher steps. Use when user wants to update a dataset, refresh data, run ETL update, or mentions updating dataset versions.

owid
owid
data-ai
open
data-engineering
143

community-insight-taxonomy

Tagging schema for classifying community signals by persona, journey, and business impact.

gtmagents
gtmagents
data-ai
open
data-engineering
143

signal-taxonomy

Use to define schemas, topic tags, and lineage metadata for enriched signals.

gtmagents
gtmagents
data-ai
open
data-engineering
143

deal-review

Use to run structured opportunity inspections that align pipeline data with buyer reality.

gtmagents
gtmagents
data-ai
open
data-engineering
143

comp-mechanics

Use to assemble rate tables, accelerator logic, and plan governance templates.

gtmagents
gtmagents
data-ai
open
data-engineering
143

golden-dataset

Golden dataset lifecycle patterns for curation, versioning, quality validation, and CI integration. Use when building evaluation datasets, managing dataset versions, validating quality scores, or integrating golden tests into pipelines.

yonatangross
yonatangross
data-ai
open
data-engineering
143

ct-epic-architect

Epic planning and task decomposition for breaking down large initiatives into atomic, executable tasks. Provides dependency analysis, wave-based parallel execution planning, hierarchy management, and research linking. Use when creating epics, decomposing initiatives into task trees, planning parallel workflows, or analyzing task dependencies. Triggers on epic creation, task decomposition requests, or planning phase work.

kryptobaseddev
kryptobaseddev
data-ai
open
data-engineering
143

migrate-dataset

Migrate a legacy OWID dataset (no catalogPath) into the ETL pipeline. Use when user wants to migrate, backport, or convert a legacy dataset by ID, or mentions datasets without catalogPath.

owid
owid
data-ai
open
data-engineering
143

create-multidim

Create multi-dimensional (multidim/MDIM) chart configurations in the OWID ETL pipeline. Use this skill when the user wants to create a new multidim, build a multi-dimensional chart, combine multiple charts into one with dimension toggles, or mentions 'multidim' or 'MDIM'.

owid
owid
data-ai
open
data-engineering
143

ct-orchestrator

Pipeline-aware orchestration skill for managing complex workflows through subagent delegation. Use when the user asks to "orchestrate", "orchestrator mode", "run as orchestrator", "delegate to subagents", "coordinate agents", "spawn subagents", "multi-agent workflow", "context-protected workflow", "agent farm", "HITL orchestration", "pipeline management", or needs to manage complex workflows by delegating work to subagents while protecting the main context window. Enforces ORC-001 through ORC-009 constraints. Provider-neutral — works with any AI agent runtime.

kryptobaseddev
kryptobaseddev
data-ai
open
data-engineering
143

ct-adr-recorder

Records Architecture Decision Records from accepted consensus verdicts. Use when promoting a consensus outcome to a formal ADR: drafts the document in the proposed-then-accepted HITL lifecycle, links to the originating consensus manifest, persists the decision to the canonical SQLite decisions table, and triggers downstream invalidation when an accepted ADR is later superseded. Triggers on phrases like 'write ADR', 'record architecture decision', 'formalize this decision', 'lock in the choice', 'create ADR-XXX', or when a consensus task reaches completed status and needs formalization.

kryptobaseddev
kryptobaseddev
data-ai
open
data-engineering
142

kql-query-authoring

Use this skill when asked to write, create, or help with KQL (Kusto Query Language) queries for Microsoft Sentinel, Defender XDR, or Azure Data Explorer. Triggers on keywords like "write KQL", "create KQL query", "help with KQL", "query [table]", "KQL for [scenario]", or when a user requests queries for specific data analysis scenarios. This skill uses schema validation, Microsoft Learn documentation, and community examples to generate production-ready KQL queries.

SCStelz
SCStelz
data-ai
open
data-engineering
141

archaeology

Transform narratives into a queryable decision graph

notactuallytreyanastasio
notactuallytreyanastasio
data-ai
open
data-engineering
140

csv-parser

Parse and analyze CSV files with data validation

maxvaega
maxvaega
data-ai
open
data-engineering
139

gke-reliability

Workflows for ensuring high availability and reliability of GKE workloads.

GoogleCloudPlatform
GoogleCloudPlatform
data-ai
open
Previous
Page 42 / 65
Next