home/categories/data-engineering
category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 مهارةall categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
data-engineering
14

cfn-loop-output-processing

Type-safe output processing for Loop 2 validators and Loop 3 implementers. Use when parsing agent confidence scores, feedback, or calculating consensus from multiple validators.

masharratt
masharratt
data-ai
open
data-engineering
13

bigquery

Instructions for querying Google BigQuery using the bq command-line tool. Useful for running SQL queries, exploring datasets, and exporting results.

sourcegraph
sourcegraph
data-ai
open
data-engineering
13

dag-dynamic-replanner

Modifies DAG structure during execution in response to failures, new requirements, or runtime discoveries. Supports node insertion, removal, and dependency rewiring. Activate on 'replan dag', 'modify workflow', 'add node', 'remove node', 'dynamic modification'. NOT for initial DAG building (use dag-graph-builder) or scheduling (use dag-task-scheduler).

erichowens
erichowens
data-ai
open
data-engineering
13

dag-result-aggregator

Combines and synthesizes outputs from parallel DAG branches. Handles merge strategies, conflict resolution, and result formatting. Activate on 'aggregate results', 'combine outputs', 'merge branches', 'synthesize results', 'fan-in'. NOT for execution (use dag-parallel-executor) or scheduling (use dag-task-scheduler).

erichowens
erichowens
data-ai
open
data-engineering
13

dag-executor

End-to-end DAG execution orchestrator that decomposes arbitrary tasks into agent graphs and executes them in parallel. The intelligence layer that makes DAG Framework operational.

erichowens
erichowens
data-ai
open
data-engineering
13

convex-mutations

This skill should be used when implementing Convex mutation functions. It provides comprehensive guidelines for defining, registering, calling, and scheduling mutations, including database operations, transactions, and scheduled job patterns.

Sstobo
Sstobo
data-ai
open
data-engineering
13

dag-dependency-resolver

Validates DAG structures, performs topological sorting, detects cycles, and resolves dependency conflicts. Uses Kahn's algorithm for optimal execution ordering. Activate on 'resolve dependencies', 'topological sort', 'cycle detection', 'dependency order', 'validate dag'. NOT for building DAGs (use dag-graph-builder) or scheduling execution (use dag-task-scheduler).

erichowens
erichowens
data-ai
open
data-engineering
13

dag-parallel-executor

Executes DAG waves with controlled parallelism using the Task tool. Manages concurrent agent spawning, resource limits, and execution coordination. Activate on 'execute dag', 'parallel execution', 'concurrent tasks', 'run workflow', 'spawn agents'. NOT for scheduling (use dag-task-scheduler) or building DAGs (use dag-graph-builder).

erichowens
erichowens
data-ai
open
data-engineering
12

agent-data-engineer

Expert data engineer specializing in building scalable data pipelines, ETL/ELT processes, and data infrastructure. Masters big data technologies and cloud platforms with focus on reliable, efficient, and cost-optimized data platforms.

Tony363
Tony363
data-ai
open
data-engineering
12

harness-step-schema

Creates or updates pipeline step schemas in the harness-schema repository. ALWAYS use this skill when user mentions ANY of these: - Step operations: "create step", "add step", "new step", "update step", "modify step", "edit step", "implement step", "define step", "build step" - Schema files: "step-info.yaml", "step-node.yaml", "StepInfo", "StepNode", "step schema", "step definition", "step spec" - Stage availability: "add to stage", "enable in stage", "available in stage", "register step", "execution-wrapper-config" - Template support: "step template", "template_config.yaml", "enable as template" - JSON updates: "pipeline.json", "template.json", "add definition", "update schema" - Field work: "add field", "add property", "new field", "step property", "step field", "modify field" - Expression support: "runtime input", "JEXL", "expression support", "<+input>", "common-jexl" - Category mentions: "CD step", "CI step", "custom step", "FME step", "feature flag step", "approval step" If user is working with files

harness
harness
data-ai
open
data-engineering
11

stream-processing

Use when designing real-time data processing systems, choosing stream processing frameworks, or implementing event-driven architectures. Covers Kafka, Flink, and streaming patterns.

melodic-software
melodic-software
data-ai
open
data-engineering
11

data-governance-check

Review data handling for privacy and retention. Use when a senior developer needs governance validation.

proflead
proflead
data-ai
open
data-engineering
11

tidbx-serverless-driver

Guidance for using the TiDB Cloud Serverless Driver (Beta) in Node.js, serverless, and edge environments. Use when connecting to TiDB Cloud Starter/Essential over HTTP with @tidbcloud/serverless, or when integrating with Prisma/Kysely/Drizzle serverless adapters in Vercel/Cloudflare/Netlify/Deno/Bun. Use this skill for serverless driver setup and edge runtime guidance.

pingcap
pingcap
data-ai
open
data-engineering
11

pytidb

PyTiDB (pytidb) setup and usage for TiDB from Python. Covers connecting, table modeling (TableModel), CRUD, raw SQL, transactions, vector/full-text/hybrid search, auto-embedding, custom embedding functions, and reference templates/snippets (vector/hybrid/image) plus agent-oriented examples (RAG/memory/text2sql).

pingcap
pingcap
data-ai
open
data-engineering
11

prepare-dataset

Process and validate datasets for training. Use when setting up data pipelines.

mvillmow
mvillmow
data-ai
open
data-engineering
11

etl-elt-patterns

Use when designing data pipelines, choosing between ETL and ELT approaches, or implementing data transformation patterns. Covers modern data pipeline architecture.

melodic-software
melodic-software
data-ai
open
data-engineering
11

data-schema-knowledge-modeling

Use when designing database schemas, need to model domain entities and relationships clearly, building knowledge graphs or ontologies, creating API data models, defining system boundaries and invariants, migrating between data models, establishing taxonomies or hierarchies, user mentions "schema", "data model", "entities", "relationships", "ontology", "knowledge graph", or when scattered/inconsistent data structures need formalization.

lyndonkl
lyndonkl
data-ai
open
data-engineering
10

validate-incremental-sync

Validate that a connector's CDC/incremental sync implementation correctly tracks offsets and filters records.

databrickslabs
databrickslabs
data-ai
open
data-engineering
10

implement-connector

Implement a Python connector that conforms to the LakeflowConnect interface for data ingestion.

databrickslabs
databrickslabs
data-ai
open
data-engineering
9

scaffold-write

Guide for adding a new write operation (command/mutation) to the Backend. Focuses on Event Sourcing, Wolverine commands, and Projections.

aalmada
aalmada
data-ai
open
data-engineering
9

senior-data-engineer

World-class data engineering skill for building scalable data pipelines, ETL/ELT systems, real-time streaming, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka, Flink, Kinesis, and modern data stack. Includes data modeling, pipeline orchestration, data quality, streaming quality monitoring, and DataOps. Use when designing data architectures, building batch or streaming data pipelines, optimizing data workflows, or implementing data governance.

rickydwilson-dcs
rickydwilson-dcs
data-ai
open
data-engineering
8

logseq-db-plugin-api

Essential knowledge for developing Logseq plugins for DB (database) graphs. Covers core APIs, event-driven updates with DB.onChanged, multi-layered tag detection, property value iteration, advanced query patterns (tag inheritance, or-join), and production-tested plugin architecture patterns. References production-validated code from logseq-checklist v1.0.0.

kerim
kerim
data-ai
open
data-engineering
7

databricks-2025

Databricks Job activity and 2025 Azure Data Factory connectors

JosiahSiegel
JosiahSiegel
data-ai
open
data-engineering
7

troubleshooting-assistant

Diagnoses and resolves MCP server registration failures, GPU detection, BigQuery authentication, index build failures, import errors, search quality issues, and performance problems.

RobThePCGuy
RobThePCGuy
data-ai
open
Previous
Page 50 / 65
Next