home/categories/data-engineering

category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 skillsall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-engineering

mova-skill-ingest-store-episode-basic-wrapper

Persists ds.episode_skill_ingest_run_v1 in the lab’s genetic file store.

Leryk1981

data-ai

open

data-engineering

backend--queries

Apply the Agent OS standard for backend queries.

tlabs-xyz

data-ai

open

data-engineering

ef-core-patterns

Use this skill when implementing data access, repositories, or query logic with Entity Framework Core.

michaellperry

data-ai

open

data-engineering

Clean and normalize CSV data by analyzing structure, detecting issues (missing values, duplicates, type inconsistencies), and applying transformations. Use when users need to prepare messy CSV files for analysis or import.

elertan

data-ai

open

data-engineering

refactorpandas

Refactor Pandas code to improve maintainability, readability, and performance. Identifies and fixes loops/.iterrows() that should be vectorized, overuse of .apply() where vectorized alternatives exist, chained indexing patterns, inplace=True usage, inefficient dtypes, missing method chaining opportunities, complex filters, merge operations without validation, and SettingWithCopyWarning patterns. Applies Pandas 2.0+ features including PyArrow backend, Copy-on-Write, vectorized operations, method chaining, .query()/.eval(), optimized dtypes, and pipeline patterns.

SnakeO

data-ai

open

data-engineering

hive-endpoint

How to create API endpoints in Hive framework

paralect

data-ai

open

data-engineering

output-error-missing-schemas

Fix missing schema definitions in Output SDK steps. Use when seeing type errors, undefined properties at step boundaries, validation failures, or when step inputs/outputs aren't being properly typed.

growthxai

data-ai

open

data-engineering

data-import-parsers

Implement or refactor data import/parsing so it streams files sequentially (memory-safe), validates/coerces types explicitly, skips irreparable records while logging them to an error CSV (full original columns + timestamp/file/line/error), emits rows_ok/rows_skipped/parse_errors metrics, and guarantees idempotent DB writes.

janjaszczak

data-ai

open

data-engineering

data-integrity-guardian

Maintain database quality, ensure data alignment, verify mission compliance, prepare for JusticeHub syndication.

Acurioustractor

data-ai

open

data-engineering

dataclass-optimization

Python dataclass best practices: slots, frozen, validation. Trigger when optimizing dataclasses or creating config classes.

smith6jt-cop

data-ai

open

data-engineering

skogai-jq

Use when performing JSON transformations, manipulating nested JSON structures, filtering arrays, extracting values, validating JSON schemas, or composing multi-step JSON operations. This skill provides 60+ schema-driven jq transformations optimized for AI agent discoverability.

SkogAI

data-ai

open

data-engineering

pm-03-data-quality

Assess data quality for the normalised log and recommend remediation thresholds and privacy handling.

Wattysaid

data-ai

open

data-engineering

ontology-phase-4-generate

Phase 4 of Ontology Builder Pipeline. Generates final Ontology artifacts (entity definitions, workflow catalog, concept guides) from DRD. Use after Phase 3 DRD is complete.

a4b-corporation

data-ai

open

data-engineering

decision-log-audit

Audits Decision JSONL logs for schema compliance, required metadata, and invariants across recorded decisions.

Nepopams

data-ai

open

data-engineering

airflow-dag

Apache Airflow DAG development with TaskFlow API, Google Cloud operators (BigQuery, GCS), dbt integration, and dynamic DAG generation. Use when creating or modifying Airflow DAGs, implementing data pipeline orchestration, setting up cross-DAG dependencies with ExternalTaskSensor, adding deferrable operators, or configuring error handling and retries.

ilorozco11

data-ai

open

data-engineering

synthetic-data-generator

テスト用の合成データ生成スキル。リアルなユーザーデータ、トランザクション、ログ、APIレスポンス等を生成。GDPR準拠、フェイカー連携、スキーマベース生成、時系列データ、異常データ生成に対応。

ntaksh42

data-ai

open

data-engineering

data-source-connect

Connect your own data source to replace the demo unicorns data. Use when the user wants to use their own database URL or CSV file instead of the sample data. Triggers on requests to connect database, import CSV, change data source, use own data, or switch from demo data.

rebyteai-template

data-ai

open

data-engineering

nf-process-to-galaxy-tool

Convert a single Nextflow process to a Galaxy tool XML

galaxyproject

data-ai

open

data-engineering

altinity-expert-clickhouse-logs

Analyze ClickHouse system log table health including TTL configuration, disk usage, freshness, and cleanup. Use for system log issues and TTL configuration.

Altinity

data-ai

open

data-engineering

Load when working on data pipelines, datasets, reproducibility, or data infrastructure topics. Contains best practices for data engineering, ETL/ELT patterns, and ensuring reproducible data workflows.

chekos

data-ai

open

data-engineering

snowflake-query

ユーザーの自然言語指示を受け取り、Snowflakeで実行するSQLクエリを生成・実行し、結果をCSV形式で出力します。「Snowflakeで〜を取得して」「〜のデータを表示して」などのリクエストで自動的に起動します。

0tarof

data-ai

open

data-engineering

fastapi-backend-template

FastAPI with PostgreSQL, async SQLAlchemy 2.0, Alembic, and Docker.

rebyteai-template

data-ai

open

data-engineering

data-engineer

Build scalable data pipelines, ETL/ELT processes, and data infrastructure. Use when: (1) designing data architectures or lakehouse patterns, (2) building Spark/Kafka/Flink/Beam pipelines, (3) optimizing Snowflake/BigQuery/Redshift queries, (4) implementing Airflow/Prefect/Dagster orchestration, (5) setting up data quality frameworks, (6) cost-optimizing data platforms.

robertlupo1997

data-ai

open

data-engineering

ontology-phase-1-ingest

Phase 1 of Ontology Builder Pipeline. Ingests and catalogs all input materials from _input/ folder. Use when starting ontology building process or when processing new input documents for domain analysis.

a4b-corporation

data-ai

open

Page 60 / 65