home/categories/data-engineering
category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 スキルall categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
data-engineering
2

reduce-orchestrator

MapReduce root/orchestrator with a mandatory parallel Verify phase, narrative-first reduction, deterministic artifact lifecycle management (.rlm run/archives), and concurrency safety (per-run locks + cleanup lock). Use when coordinating many parallel map-worker tasks under optional hint_paths, then synthesizing narrative reports into a decision to iterate or finish.

hyophyop
hyophyop
data-ai
open
data-engineering
2

spring-kafka-integration

[Extends backend-developer] Kafka specialist for Spring/Reactor. Use for Kafka producers/consumers, DLT, retry mechanisms, transactional outbox, event sourcing. Covers Spring Kafka 4.x and Reactor Kafka 1.3.x. Invoke alongside backend-developer.

olehsvyrydov
olehsvyrydov
data-ai
open
data-engineering
2

koan-performance

Streaming, pagination, count strategies, bulk operations

sylin-org
sylin-org
data-ai
open
data-engineering
2

say-ducklake-xor

Parallel thread/DuckLake discovery with XOR uniqueness from gay_seed. Finds "say" or MCP usage, cross-refs with all DuckDB sources, launches bounded parallel ops.

plurigrid
plurigrid
data-ai
open
data-engineering
2

lcp-execplan

Create and maintain ExecPlans for complex work (design-to-implementation) following the repo's ExecPlan standard.

YusukeShimizu
YusukeShimizu
data-ai
open
data-engineering
2

golden-dataset-validation

Validation rules, schema checks, duplicate detection, and coverage analysis for golden dataset integrity

yonatangross
yonatangross
data-ai
open
data-engineering
2

parquet-optimization

Proactively analyzes Parquet file operations and suggests optimization improvements for compression, encoding, row group sizing, and statistics. Activates when users are reading or writing Parquet files or discussing Parquet performance.

EmilLindfors
EmilLindfors
data-ai
open
data-engineering
2

golden-dataset-management

Backup, restore, and validate golden datasets for AI/ML systems - ensuring test data integrity and preventing catastrophic data loss

yonatangross
yonatangross
data-ai
open
data-engineering
2

amp-api-awareness

Extract hidden Amp API patterns from local thread data via DuckDB analysis

plurigrid
plurigrid
data-ai
open
data-engineering
2

fswatch-duckdb

FileSystemWatcher over /tmp with DuckDB/DuckLake persistence. Auto-starts on Amp sessions for resilient file monitoring with temporal queries.

plurigrid
plurigrid
data-ai
open
data-engineering
2

fujitsu-mainframe

Analyzes and assists with Fujitsu mainframe systems including FACOM, PRIMERGY, BS2000/OSD, OSIV/MSP, OSIV/XSP, NetCOBOL, PowerCOBOL, and Fujitsu JCL. Extracts business logic from Fujitsu COBOL programs, analyzes Fujitsu JCL jobs, migrates Fujitsu mainframe applications to modern platforms (Java, cloud, containers), and creates migration strategies. Use when working with Fujitsu mainframe migration, FACOM systems, BS2000, OSIV platforms, NetCOBOL, PowerCOBOL, Fujitsu-specific COBOL extensions, Fujitsu JCL, or when users mention Fujitsu mainframe modernization, analyzing Fujitsu COBOL/JCL, SYMFOWARE database, or planning migration from Fujitsu legacy systems.

DauQuangThanh
DauQuangThanh
data-ai
open
data-engineering
2

db-migration

Automatically triggered for database migrations, schema changes, and data transformations. Use when working with database structure, migrations, or ORM models.

scotthavird
scotthavird
data-ai
open
data-engineering
2

filtered-data

Passthrough filter agent. Calls data sub-agents, validates responses, returns only clean data.

faisalanjum
faisalanjum
data-ai
open
data-engineering
2

claude-restart-compact

Compact context at natural breakpoints to free tokens and continue working. Use PROACTIVELY at phase boundaries, after commits, or when token usage >150k. Better than random auto-compact. Supports custom compaction prompts.

ManuelKugelmann
ManuelKugelmann
data-ai
open
data-engineering
2

duck-time-travel

DuckDB time-travel queries for temporal versioning and causality tracking

plurigrid
plurigrid
data-ai
open
data-engineering
1

looker-expert

Expert-level Looker BI, LookML, explores, dimensions, measures, dashboards, and data modeling

personamanagmentlayer
personamanagmentlayer
data-ai
open
data-engineering
1

data-engineer

Data pipelines and analytics infrastructure

violetio
violetio
data-ai
open
data-engineering
1

aggregating-event-datasets

Aggregate and summarize event datasets (logs) using OPAL statsby. Use when you need to count, sum, or calculate statistics across log events. Covers make_col for derived columns, statsby for aggregation, group_by for grouping, aggregation functions (count, sum, avg, percentile), and topk for top N results. Returns single summary row per group across entire time range. For time-series trends, see time-series-analysis skill.

rustomax
rustomax
data-ai
open
data-engineering
1

data-validation-reporter

Generate interactive validation reports with quality scoring, missing data analysis, and type checking. Combines Pandas validation, Plotly visualization, and YAML configuration for comprehensive data quality reporting.

vamseeachanta
vamseeachanta
data-ai
open
data-engineering
1

python-polars

This skill should be used when the user asks to "work with polars", "create a dataframe", "use lazy evaluation", "migrate from pandas", "optimize data pipelines", "read parquet files", "group by operations", or needs guidance on Polars DataFrame operations, expression API, performance optimization, or data transformation workflows.

tbhb
tbhb
data-ai
open
data-engineering
1

nixtla-schema-mapper

Transform data sources to Nixtla schema (unique_id, ds, y) with column inference. Use when preparing data for forecasting. Trigger with 'map to Nixtla schema' or 'transform data'.

intent-solutions-io
intent-solutions-io
data-ai
open
data-engineering
1

vaex

Use this skill for processing and analyzing large tabular datasets (billions of rows) that exceed available RAM. Vaex excels at out-of-core DataFrame operations, lazy evaluation, fast aggregations, efficient visualization of big data, and machine learning on large datasets. Apply when users need to work with large CSV/HDF5/Arrow/Parquet files, perform fast statistics on massive datasets, create visualizations of big data, or build ML pipelines that do not fit in memory.

hxk622
hxk622
data-ai
open
data-engineering
1

dask

Distributed computing for larger-than-RAM pandas/NumPy workflows. Use when you need to scale existing pandas/NumPy code beyond memory or across clusters. Best for parallel file processing, distributed ML, integration with existing pandas code. For out-of-core analytics on single machine use vaex; for in-memory speed use polars.

hxk622
hxk622
data-ai
open
Previous
Page 54 / 65
Next