home/categories/data-engineering

category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 skillsall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-engineering

lakehouse-patterns

Comprehensive guide to data lakehouse architecture combining data lake flexibility with data warehouse performance using Delta Lake, Iceberg, and Hudi

AmnadTaowsoam

data-ai

open

data-engineering

oracle

Use the @steipete/oracle CLI to bundle a prompt plus the right files and get a second-model review (API or browser) for debugging, refactors, design checks, or cross-validation.

doubleflannel

data-ai

open

data-engineering

nf-pipeline-to-galaxy-workflow

Convert a complete Nextflow pipeline to Galaxy

galaxyproject

data-ai

open

data-engineering

pm-04-clean-filter

Apply cleaning and filtering actions based on data quality decisions and generate filtered log artefacts.

Wattysaid

data-ai

open

data-engineering

data-catalog-entry

Create standardized metadata for data assets. Use when documenting new datasets, building data catalogs, improving data discoverability, or creating data dictionaries for teams.

nimrodfisher

data-ai

open

data-engineering

High-performance JSON and CSV parsing library for Clojure. Use when working with JSON or CSV data and need fast in Clojure, efficient parsing/writing with a clojure.data.json/clojure.data.csv compatible API.

Ramblurr

data-ai

open

data-engineering

clickhouse-cloud-connection

Test and validate ClickHouse Cloud connection using clickhouse-connect for gapless-crypto-clickhouse. Use when validating connectivity, troubleshooting connection issues, or verifying environment configuration. Includes version check and query validation.

terrylica

data-ai

open

data-engineering

awkward-array

Guidance for working with Awkward Array 2.0 jagged arrays and records in Python. Use when building or debugging `awkward` workflows, including record construction with `ak.zip`, adding fields with `ak.with_field`, filtering/aggregation, combinatorics (`ak.cartesian`/`ak.combinations`), `argmin`/`argmax` slicing, flattening, sorting, and NumPy interop or common Awkward pitfalls.

gordonwatts

data-ai

open

data-engineering

data-quality-checks

Comprehensive guide to data quality validation, testing frameworks, anomaly detection, and data observability for production data pipelines

AmnadTaowsoam

data-ai

open

data-engineering

data-modeler

イミュータブルデータモデルに基づくデータモデリング自動化Skill。ブラックボードパターンで段階的にエンティティ抽出からER図生成まで実行します。

tis-abe-akira

data-ai

open

data-engineering

agentdb-state-manager

Persistent state management using AgentDB (DuckDB) for workflow analytics and checkpoints. Provides read-only analytics cache synchronized from TODO_*.md files, enabling: - Complex dependency graph queries - Historical workflow metrics - Context checkpoint storage/recovery - State transition analysis Use when: Data gathering and analysis for workflow state tracking Triggers: "analyze workflow", "query state", "checkpoint", "workflow metrics"

stharrold

data-ai

open

data-engineering

cql-type-system-schema-handling

Implement and deserialize all CQL types including primitives (int, text, timestamp, uuid, varint, decimal), collections (list, set, map), tuples, UDTs (user-defined types), and frozen types. Use when working with CQL type deserialization, schema validation, collection parsing, UDT handling, or type-correct data generation.

pmcfadin

data-ai

open

data-engineering

data-engineer

Data Engineer Agent. ETL 파이프라인, 데이터 웨어하우스, 데이터 레이크 구축을 담당합니다.

shaul1991

data-ai

open

data-engineering

data-pipeline

GenStage, Broadway, and Flow for Elixir data pipelines

layeddie

data-ai

open

data-engineering

duckdb-remote-parquet-query

Query remote Parquet files via HTTP without downloading using DuckDB httpfs. Leverage column pruning, row filtering, and range requests for efficient bandwidth usage. Use for crypto/trading data distribution and analytics.

terrylica

data-ai

open

data-engineering

data-migration-expert

Use this agent when reviewing database migrations, schema changes, or data transformations. Specializes in validating ID mappings, checking for swapped values, and verifying rollback safety. Triggers on requests like "migration review", "schema change validation".

jovermier

data-ai

open

data-engineering

coding-conventions

Field naming conventions for the Job Aggregator project. Use this skill when encountering type errors related to field names (camelCase vs snake_case), database constraint violations, or data mapping issues between Python/TypeScript/PostgreSQL.

beetz12

data-ai

open

data-engineering

openspec-sync-specs

Sync delta specs from a change to main specs. Use when the user wants to update main specs with changes from a delta spec, without archiving the change.

austinmoody

data-ai

open

data-engineering

implementing-io-pipelines

Implements high-performance streaming using System.IO.Pipelines in .NET. Use when building network protocols, parsing binary data, or processing large streams efficiently.

christian289

data-ai

open

data-engineering

memory-delta

Auto-execute when "[MEMORY_KEEPER_DELTA]" trigger detected

ZipperBagCoffee

data-ai

open

data-engineering

bigquery-ethereum-data-acquisition

Workflow for acquiring historical Ethereum blockchain data using Google BigQuery free tier. Empirically validated for cost estimation, streaming downloads, and DuckDB integration. Use when planning bulk historical data acquisition or comparing data source options for blockchain network metrics.

terrylica

data-ai

open

data-engineering