home/categories/data-engineering

category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 स्किल्सall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-engineering

cfn-loop-output-processing

Type-safe output processing for Loop 2 validators and Loop 3 implementers. Use when parsing agent confidence scores, feedback, or calculating consensus from multiple validators.

masharratt

data-ai

open

data-engineering

bigquery

Instructions for querying Google BigQuery using the bq command-line tool. Useful for running SQL queries, exploring datasets, and exporting results.

sourcegraph

data-ai

open

data-engineering

Modifies DAG structure during execution in response to failures, new requirements, or runtime discoveries. Supports node insertion, removal, and dependency rewiring. Activate on 'replan dag', 'modify workflow', 'add node', 'remove node', 'dynamic modification'. NOT for initial DAG building (use dag-graph-builder) or scheduling (use dag-task-scheduler).

erichowens

data-ai

open

data-engineering

dag-result-aggregator

Combines and synthesizes outputs from parallel DAG branches. Handles merge strategies, conflict resolution, and result formatting. Activate on 'aggregate results', 'combine outputs', 'merge branches', 'synthesize results', 'fan-in'. NOT for execution (use dag-parallel-executor) or scheduling (use dag-task-scheduler).

erichowens

data-ai

open

data-engineering

dag-executor

End-to-end DAG execution orchestrator that decomposes arbitrary tasks into agent graphs and executes them in parallel. The intelligence layer that makes DAG Framework operational.

erichowens

data-ai

open

data-engineering

convex-mutations

This skill should be used when implementing Convex mutation functions. It provides comprehensive guidelines for defining, registering, calling, and scheduling mutations, including database operations, transactions, and scheduled job patterns.

Sstobo

data-ai

open

data-engineering

dag-dependency-resolver

Validates DAG structures, performs topological sorting, detects cycles, and resolves dependency conflicts. Uses Kahn's algorithm for optimal execution ordering. Activate on 'resolve dependencies', 'topological sort', 'cycle detection', 'dependency order', 'validate dag'. NOT for building DAGs (use dag-graph-builder) or scheduling execution (use dag-task-scheduler).

erichowens

data-ai

open

data-engineering

dag-parallel-executor

Executes DAG waves with controlled parallelism using the Task tool. Manages concurrent agent spawning, resource limits, and execution coordination. Activate on 'execute dag', 'parallel execution', 'concurrent tasks', 'run workflow', 'spawn agents'. NOT for scheduling (use dag-task-scheduler) or building DAGs (use dag-graph-builder).

erichowens

data-ai

open

data-engineering

agent-data-engineer

Expert data engineer specializing in building scalable data pipelines, ETL/ELT processes, and data infrastructure. Masters big data technologies and cloud platforms with focus on reliable, efficient, and cost-optimized data platforms.

Tony363

data-ai

open

data-engineering

harness-step-schema

Creates or updates pipeline step schemas in the harness-schema repository. ALWAYS use this skill when user mentions ANY of these: - Step operations: "create step", "add step", "new step", "update step", "modify step", "edit step", "implement step", "define step", "build step" - Schema files: "step-info.yaml", "step-node.yaml", "StepInfo", "StepNode", "step schema", "step definition", "step spec" - Stage availability: "add to stage", "enable in stage", "available in stage", "register step", "execution-wrapper-config" - Template support: "step template", "template_config.yaml", "enable as template" - JSON updates: "pipeline.json", "template.json", "add definition", "update schema" - Field work: "add field", "add property", "new field", "step property", "step field", "modify field" - Expression support: "runtime input", "JEXL", "expression support", "<+input>", "common-jexl" - Category mentions: "CD step", "CI step", "custom step", "FME step", "feature flag step", "approval step" If user is working with files

harness

data-ai

open

data-engineering

stream-processing

Use when designing real-time data processing systems, choosing stream processing frameworks, or implementing event-driven architectures. Covers Kafka, Flink, and streaming patterns.

melodic-software

data-ai

open

data-engineering

data-governance-check

Review data handling for privacy and retention. Use when a senior developer needs governance validation.

proflead

data-ai

open

data-engineering

tidbx-serverless-driver

Guidance for using the TiDB Cloud Serverless Driver (Beta) in Node.js, serverless, and edge environments. Use when connecting to TiDB Cloud Starter/Essential over HTTP with @tidbcloud/serverless, or when integrating with Prisma/Kysely/Drizzle serverless adapters in Vercel/Cloudflare/Netlify/Deno/Bun. Use this skill for serverless driver setup and edge runtime guidance.

pingcap

data-ai

open

data-engineering

pytidb

PyTiDB (pytidb) setup and usage for TiDB from Python. Covers connecting, table modeling (TableModel), CRUD, raw SQL, transactions, vector/full-text/hybrid search, auto-embedding, custom embedding functions, and reference templates/snippets (vector/hybrid/image) plus agent-oriented examples (RAG/memory/text2sql).

pingcap

data-ai

open

data-engineering

prepare-dataset

Process and validate datasets for training. Use when setting up data pipelines.

mvillmow

data-ai

open

data-engineering

etl-elt-patterns

Use when designing data pipelines, choosing between ETL and ELT approaches, or implementing data transformation patterns. Covers modern data pipeline architecture.

melodic-software

data-ai

open

data-engineering

data-schema-knowledge-modeling

Use when designing database schemas, need to model domain entities and relationships clearly, building knowledge graphs or ontologies, creating API data models, defining system boundaries and invariants, migrating between data models, establishing taxonomies or hierarchies, user mentions "schema", "data model", "entities", "relationships", "ontology", "knowledge graph", or when scattered/inconsistent data structures need formalization.

lyndonkl

data-ai

open

data-engineering

validate-incremental-sync

Validate that a connector's CDC/incremental sync implementation correctly tracks offsets and filters records.

databrickslabs

data-ai

open

data-engineering

implement-connector

Implement a Python connector that conforms to the LakeflowConnect interface for data ingestion.

databrickslabs

data-ai

open

data-engineering

scaffold-write

Guide for adding a new write operation (command/mutation) to the Backend. Focuses on Event Sourcing, Wolverine commands, and Projections.

aalmada

data-ai

open

data-engineering

senior-data-engineer

World-class data engineering skill for building scalable data pipelines, ETL/ELT systems, real-time streaming, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka, Flink, Kinesis, and modern data stack. Includes data modeling, pipeline orchestration, data quality, streaming quality monitoring, and DataOps. Use when designing data architectures, building batch or streaming data pipelines, optimizing data workflows, or implementing data governance.

rickydwilson-dcs

data-ai

open

data-engineering

logseq-db-plugin-api

Essential knowledge for developing Logseq plugins for DB (database) graphs. Covers core APIs, event-driven updates with DB.onChanged, multi-layered tag detection, property value iteration, advanced query patterns (tag inheritance, or-join), and production-tested plugin architecture patterns. References production-validated code from logseq-checklist v1.0.0.

kerim

data-ai

open

data-engineering

databricks-2025

Databricks Job activity and 2025 Azure Data Factory connectors

JosiahSiegel

data-ai

open

data-engineering

troubleshooting-assistant

Diagnoses and resolves MCP server registration failures, GPU detection, BigQuery authentication, index build failures, import errors, search quality issues, and performance problems.

RobThePCGuy

data-ai

open

Page 50 / 65