home/categories/data-engineering

category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 स्किल्सall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-engineering

creating-bauplan-pipelines

Creates bauplan data pipeline projects with SQL and Python models. Use when starting a new pipeline, defining DAG transformations, writing models, or setting up bauplan project structure from scratch.

aiskillstore

data-ai

open

data-engineering

extend-signal-schema

Safely extend or refine AFI signal schemas and closely-related validators in afi-core, while preserving determinism, respecting PoI/PoInsight design, and obeying the AFI Droid Charter and AFI Core AGENTS.md boundaries.

aiskillstore

data-ai

open

data-engineering

spark-optimization

Optimize Apache Spark jobs with partitioning, caching, shuffle optimization, and memory tuning. Use when improving Spark performance, debugging slow jobs, or scaling data processing pipelines.

aiskillstore

data-ai

open

data-engineering

when-chaining-agent-pipelines-use-stream-chain

Chain agent outputs as inputs in sequential or parallel pipelines for data flow orchestration

aiskillstore

data-ai

open

data-engineering

data-processor

Process and transform arrays of data with common operations like filtering, mapping, and aggregation

aiskillstore

data-ai

open

data-engineering

data-validator

Validate data against schemas, business rules, and data quality standards.

aiskillstore

data-ai

open

data-engineering

data-module

myfy DataModule for database access with async SQLAlchemy. Use when working with DataModule, AsyncSession, database connections, connection pooling, migrations, or SQLAlchemy models.

psincraian

data-ai

open

data-engineering

geoparquet

Convert spatial data (GeoJSON, Shapefile, etc.) to optimized GeoParquet using the gpio CLI. Analyzes files, recommends settings, and publishes to cloud storage.

geoparquet

data-ai

open

data-engineering

aws-sdk-java-v2-dynamodb

Amazon DynamoDB patterns using AWS SDK for Java 2.x. Use when creating, querying, scanning, or performing CRUD operations on DynamoDB tables, working with indexes, batch operations, transactions, or integrating with Spring Boot applications.

giuseppe-trisciuoglio

data-ai

open

data-engineering

tensorflow-data-pipelines

Create efficient data pipelines with tf.data

TheBushidoCollective

data-ai

open

data-engineering

python-data-classes

Use when Python data modeling with dataclasses, attrs, and Pydantic. Use when creating data structures and models.

TheBushidoCollective

data-ai

open

data-engineering

scala-collections

Use when scala collections including immutable/mutable variants, List, Vector, Set, Map operations, collection transformations, lazy evaluation with views, parallel collections, and custom collection builders for efficient data processing.

TheBushidoCollective

data-ai

open

data-engineering

java-streams-api

Use when Java Streams API for functional-style data processing. Use when processing collections with streams.

TheBushidoCollective

data-ai

open

data-engineering

ecto-changesets

Use when validating and casting data with Ecto changesets including field validation, constraints, nested changesets, and data transformation. Use for ensuring data integrity before database operations.

TheBushidoCollective

data-ai

open

data-engineering

sanity-schema-manager

Manages schema lifecycle including scaffolding, safe field deprecation, and data migrations. Use when modifying existing schemas.

sanity-io

data-ai

open

data-engineering

ln-723-mockdata-migrator

Migrates mock data from Drizzle ORM schemas to C# MockData classes

levnikolaevich

data-ai

open

data-engineering

jpa-entity-creator

Creates JPA entities following best practices.

sivaprasadreddy

data-ai

open

data-engineering

data-policies

This skill should be used when the user asks to "data policy", "dictionary", "field validation", "mandatory field", "read only", "data model", "table schema", "column attributes", or any ServiceNow Data Policy and Dictionary development.

groeimetai

data-ai

open

data-engineering

ebfe-validator

Validate organized eBFE/BLE model using ras-commander dataframes. Uses init_ras_project() then checks plan_df, boundary_df, rasmap_df to verify: - All plan files exist - All DSS files exist with relative paths - All terrain files exist with relative paths - All HDF results accessible - No absolute paths (would cause GUI popups) Use after organizing eBFE model to verify it's actually runnable. Generates validation report and script for user re-verification.

gpt-cmdr

data-ai

open

data-engineering

analyzing-aorc-precipitation

Retrieves and processes AORC precipitation data for HEC-RAS/HMS models. Handles spatial averaging over watersheds, temporal aggregation, DSS export, and Atlas 14 design storms. Use when working with historical precipitation, AORC data, calibration workflows, design storm generation, rainfall analysis, SCS Type II distributions, AEP events, 100-year storms, or generating precipitation boundary conditions for rain-on-grid models. Triggers: precipitation, AORC, Atlas 14, design storm, rainfall, SCS Type II, AEP, 100-year, rain-on-grid, hyetograph, temporal distribution, areal reduction, calibration, historical precipitation.

gpt-cmdr

data-ai

open

data-engineering

analyze-bigquery-usage

Comprehensive analysis of BigQuery usage patterns, costs, and query performance

openshift-eng

data-ai

open

data-engineering

bigquery

Use bigquery CLI (instead of `bq`) for all Google BigQuery and GCP data warehouse operations including SQL query execution, data ingestion (streaming insert, bulk load, JSONL/CSV/Parquet), data extraction/export, dataset/table/view management, external tables, schema operations, query templates, cost estimation with dry-run, authentication with gcloud, data pipelines, ETL workflows, and MCP/LSP server integration for AI-assisted querying and editor support. Modern Rust-based replacement for the Python `bq` CLI with faster startup, better cost awareness, and streaming support. Handles both small-scale streaming inserts (<1000 rows) and large-scale bulk loading (>10MB files), with support for Cloud Storage integration.

lanej

data-ai

open

data-engineering

multi-source-data-merger

This skill provides guidance for merging data from multiple heterogeneous sources (CSV, JSON, Parquet, XML, etc.) into unified output formats with conflict detection and resolution. Use when tasks involve combining data from different file formats, field mapping between schemas, priority-based conflict resolution, or generating merged datasets with conflict reports.

letta-ai

data-ai

open

data-engineering

apache-airflow-orchestration

Complete guide for Apache Airflow orchestration including DAGs, operators, sensors, XComs, task dependencies, dynamic workflows, and production deployment

manutej

data-ai

open

Page 47 / 65