home/categories/data-engineering

category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 個技能all categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-engineering

414

airflow-dag-patterns

Build production Apache Airflow DAGs with best practices for operators, sensors, testing, and deployment. Use when creating data pipelines, orchestrating workflows, or scheduling batch jobs.

Dokhacgiakhoa

data-ai

open

data-engineering

414

Build scalable data pipelines, modern data warehouses, and real-time streaming architectures. Implements Apache Spark, dbt, Airflow, and cloud-native data platforms. Use PROACTIVELY for data pipeline design, analytics infrastructure, or modern data stack implementation.

Dokhacgiakhoa

data-ai

open

data-engineering

414

data-engineering-data-pipeline

You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.

Dokhacgiakhoa

data-ai

open

data-engineering

414

data-quality-frameworks

Implement data quality validation with Great Expectations, dbt tests, and data contracts. Use when building data quality pipelines, implementing validation rules, or establishing data contracts.

Dokhacgiakhoa

data-ai

open

data-engineering

414

database-admin

Expert database administrator specializing in modern cloud databases, automation, and reliability engineering. Masters AWS/Azure/GCP database services, Infrastructure as Code, high availability, disaster recovery, performance optimization, and compliance. Handles multi-cloud strategies, container databases, and cost optimization. Use PROACTIVELY for database architecture, operations, or reliability engineering.

Dokhacgiakhoa

data-ai

open

data-engineering

414

database-architect

Expert database architect specializing in data layer design from scratch, technology selection, schema modeling, and scalable database architectures. Masters SQL/NoSQL/TimeSeries database selection, normalization strategies, migration planning, and performance-first design. Handles both greenfield architectures and re-architecture of existing systems. Use PROACTIVELY for database architecture, technology selection, or data modeling decisions.

Dokhacgiakhoa

data-ai

open

data-engineering

414

database-migration

MASTER DB: Zero-Downtime, Schema Design (3NF), SQL/NoSQL.

Dokhacgiakhoa

data-ai

open

data-engineering

414

dbt-transformation-patterns

Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.

Dokhacgiakhoa

data-ai

open

data-engineering

414

error-debugging-error-analysis

You are an expert error analysis specialist with deep expertise in debugging distributed systems, analyzing production incidents, and implementing comprehensive observability solutions.

Dokhacgiakhoa

data-ai

open

data-engineering

414

error-diagnostics-error-analysis

You are an expert error analysis specialist with deep expertise in debugging distributed systems, analyzing production incidents, and implementing comprehensive observability solutions.

Dokhacgiakhoa

data-ai

open

data-engineering

414

modern-web-performance

High-Performance Web Engineering.

Dokhacgiakhoa

data-ai

open

data-engineering

414

nosql-expert

Expert guidance for distributed NoSQL databases (Cassandra, DynamoDB). Focuses on mental models, query-first modeling, single-table design, and avoiding hot partitions in high-scale systems.

Dokhacgiakhoa

data-ai

open

data-engineering

414

production-code-audit

Autonomously deep-scan entire codebase line-by-line, understand architecture and patterns, then systematically transform it to production-grade, corporate-level professional quality with optimizations

Dokhacgiakhoa

data-ai

open

data-engineering

414

scala-pro

Master enterprise-grade Scala development with functional programming, distributed systems, and big data processing. Expert in Apache Pekko, Akka, Spark, ZIO/Cats Effect, and reactive architectures. Use PROACTIVELY for Scala system design, performance optimization, or enterprise integration.

Dokhacgiakhoa

data-ai

open

data-engineering

414

spark-optimization

Optimize Apache Spark jobs with partitioning, caching, shuffle optimization, and memory tuning. Use when improving Spark performance, debugging slow jobs, or scaling data processing pipelines.

Dokhacgiakhoa

data-ai

open

data-engineering

414

tdd-orchestrator

Master TDD orchestrator specializing in red-green-refactor discipline, multi-agent workflow coordination, and comprehensive test-driven development practices. Enforces TDD best practices across teams with AI-assisted testing and modern frameworks. Use PROACTIVELY for TDD implementation and governance.

Dokhacgiakhoa

data-ai

open

data-engineering

414

tdd-workflows-tdd-cycle

Use when working with tdd workflows tdd cycle

Dokhacgiakhoa

data-ai

open

data-engineering

414

tdd-workflows-tdd-refactor

Use when working with tdd workflows tdd refactor

Dokhacgiakhoa

data-ai

open

data-engineering

414

temporal-python-pro

Master Temporal workflow orchestration with Python SDK. Implements durable workflows, saga patterns, and distributed transactions. Covers async/await, testing strategies, and production deployment. Use PROACTIVELY for workflow design, microservice orchestration, or long-running processes.

Dokhacgiakhoa

data-ai

open

data-engineering

414

production-code-audit

Dokhacgiakhoa

data-ai

open

data-engineering

411

commit

Stage and commit changes with intelligent Conventional Commits message generation and commit-splitting analysis.

joshukraine

data-ai

open

data-engineering

401

work

Coordinate non-trivial engineering work with the right level of analysis, planning, delegation, implementation, and validation. Use for multi-step requests, multi-file changes, or work that benefits from explicit execution strategy.

TechDufus

data-ai

open

data-engineering

397

table-filler

Fill `outline/tables_index.md` from `outline/table_schema.md` + evidence packs (NO PROSE in cells; citation-backed rows). **Trigger**: table filler, fill tables, evidence-first tables, index tables, 表格填充, 索引表. **Use when**: table schema exists and evidence packs are ready; you want a compact, citation-backed index table to support later writing and Appendix table curation. **Skip if**: `outline/tables_index.md` already exists and is refined (>=2 tables; citations in rows; no placeholders). **Network**: none. **Guardrail**: do not invent facts; every row must include citations; do not write paragraph cells.

WILLOSCAR

data-ai

open

data-engineering

397

extraction-form

Extract study data into a structured table (`papers/extraction_table.csv`) using the protocol’s extraction schema. **Trigger**: extraction form, extraction table, data extraction, 信息提取, 提取表. **Use when**: systematic review 在 screening 后进入 extraction（C3），需要把纳入论文按字段落到 CSV 以支持后续 synthesis。 **Skip if**: 还没有 `papers/screening_log.csv` 或 protocol 未锁定。 **Network**: none. **Guardrail**: 严格按 schema 填字段；不要在此阶段写 narrative synthesis（那是 `synthesis-writer`）。

WILLOSCAR

data-ai

open

Page 26 / 65