home/categories/data-engineering

category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 스킬all categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-engineering

349

411-frameworks-quarkus-jdbc

Use when you need programmatic JDBC in Quarkus — Agroal DataSource, parameterized SQL, transactions, batching, and Dev Services. Part of the skills-for-java project

jabrena

data-ai

open

data-engineering

349

Use when you need programmatic JDBC in Micronaut — pooled DataSource, parameterized SQL, io.micronaut.transaction.annotation.Transactional, batching, and domain exception translation. Part of the skills-for-java project

jabrena

data-ai

open

data-engineering

347

data-analysis

High-performance data analysis using Polars - load, transform, aggregate, visualize and export tabular data. Use for CSV/JSON/Parquet processing, statistical analysis, time series, and creating charts.

ArtificialAnalysis

data-ai

open

data-engineering

342

migrating-json-schemas

Migrates JSON Schemas between draft versions for use with z-schema. Use when the user wants to upgrade schemas from draft-04 to draft-2020-12, convert between draft formats, update deprecated keywords, replace id with $id, convert definitions to $defs, migrate items to prefixItems, replace dependencies with dependentRequired or dependentSchemas, adopt unevaluatedProperties or unevaluatedItems, or adapt schemas to newer JSON Schema features.

zaggino

data-ai

open

data-engineering

341

hl-build-pipeline-app

Build a complete GStreamer pipeline app for real-time video processing on Hailo-8/8L/10H.

hailo-ai

data-ai

open

data-engineering

333

transforming-data

Transform raw data into analytical assets using ETL/ELT patterns, SQL (dbt), Python (pandas/polars/PySpark), and orchestration (Airflow). Use when building data pipelines, implementing incremental models, migrating from pandas to polars, or orchestrating multi-step transformations with testing and quality checks.

ancoleman

data-ai

open

data-engineering

333

ingesting-data

Data ingestion patterns for loading data from cloud storage, APIs, files, and streaming sources into databases. Use when importing CSV/JSON/Parquet files, pulling from S3/GCS buckets, consuming API feeds, or building ETL pipelines.

ancoleman

data-ai

open

data-engineering

333

streaming-data

Build event streaming and real-time data pipelines with Kafka, Pulsar, Redpanda, Flink, and Spark. Covers producer/consumer patterns, stream processing, event sourcing, and CDC across TypeScript, Python, Go, and Java. When building real-time systems, microservices communication, or data integration pipelines.

ancoleman

data-ai

open

data-engineering

332

walkeros-create-transformer

Use when creating a new walkerOS transformer. Example-driven workflow for validation, enrichment, or redaction transformers.

elbwalker

data-ai

open

data-engineering

332

walkeros-understanding-transformers

Use when working with walkerOS transformers, understanding event validation/enrichment/redaction, or learning about transformer chaining. Covers interface, return values, and pipeline integration.

elbwalker

data-ai

open

data-engineering

332

nw-command-design-patterns

Best practices for command definition files - size targets, declarative template, anti-patterns, and canonical examples based on research evidence

nWave-ai

data-ai

open

data-engineering

332

nw-data-architecture-patterns

Data architecture patterns (warehouse, lake, lakehouse, mesh), ETL/ELT pipelines, streaming architectures, scaling strategies, and schema design patterns

nWave-ai

data-ai

open

data-engineering

332

nw-database-technology-selection

Database comparison catalogs, RDBMS vs NoSQL selection criteria, CAP/ACID/BASE theory, OLTP vs OLAP, and technology-specific characteristics

nWave-ai

data-ai

open

data-engineering

332

nw-deliver

Orchestrates the full DELIVER wave end-to-end (roadmap > execute-all > finalize). Use when all prior waves are complete and the feature is ready for implementation.

nWave-ai

data-ai

open

data-engineering

332

nw-der-review-criteria

Evaluation criteria and scoring for data engineering artifact reviews

nWave-ai

data-ai

open

data-engineering

332

nw-devops

Designs CI/CD pipelines, infrastructure, observability, and deployment strategy. Use when preparing platform readiness for a feature.

nWave-ai

data-ai

open

data-engineering

332

nw-divio-framework

DIVIO/Diataxis four-quadrant documentation framework - type definitions, classification decision tree, and signal catalog

nWave-ai

data-ai

open

data-engineering

332

nw-research-methodology

Research output templates, distillation workflow, and quality standards for evidence-driven research

nWave-ai

data-ai

open

data-engineering

325

differential-expression

Bulk transcriptomics differential expression with count-aware modeling, design validation, contrast handling, thresholded exports, and publication-ready DE figures.

Runchuan-BU

data-ai

open

data-engineering

319

meeting-transcript

Process meeting recordings and notes into structured decisions, action items, and team dynamics with intelligent noise filtering

huytieu

data-ai

open

data-engineering

317

message-bus

File-based message queue for inter-agent coordination. Used by workers AND board directors to communicate. Provides: progress updates, task completion signals, file locking, board deliberation. Core infrastructure for parallel execution.

Ibrahim-3d

data-ai

open

data-engineering

317

parallel-dispatch

Parallel execution engine for dispatching worker agents. Used by conductor-orchestrator to spawn multiple workers simultaneously from DAG parallel groups. Handles dispatch, monitoring, aggregation, and failure recovery.

Ibrahim-3d

data-ai

open

data-engineering

314

scenario-scaffolding

Assists with creating complete ITBench scenarios by applying fault mechanisms to specific services, populating scenario files, and generating groundtruth DSL with fault propagations and alert predictions.

itbench-hub

data-ai

open

data-engineering

314

sf-data

Salesforce data operations with 130-point scoring. TRIGGER when: user creates test data, performs bulk import/export, uses sf data CLI commands, or needs data factory patterns for Apex tests. DO NOT TRIGGER when: SOQL query writing only (use sf-soql), Apex test execution (use sf-testing), or metadata deployment (use sf-deploy).

Jaganpro

data-ai

open

Page 28 / 65