home/categories/data-engineering
category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541টি স্কিলall categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
data-engineering
90

creating-bauplan-pipelines

Creates bauplan data pipeline projects with SQL and Python models. Use when starting a new pipeline, defining DAG transformations, writing models, or setting up bauplan project structure from scratch.

aiskillstore
aiskillstore
data-ai
open
data-engineering
90

extend-signal-schema

Safely extend or refine AFI signal schemas and closely-related validators in afi-core, while preserving determinism, respecting PoI/PoInsight design, and obeying the AFI Droid Charter and AFI Core AGENTS.md boundaries.

aiskillstore
aiskillstore
data-ai
open
data-engineering
90

spark-optimization

Optimize Apache Spark jobs with partitioning, caching, shuffle optimization, and memory tuning. Use when improving Spark performance, debugging slow jobs, or scaling data processing pipelines.

aiskillstore
aiskillstore
data-ai
open
data-engineering
90

data-processor

Process and transform arrays of data with common operations like filtering, mapping, and aggregation

aiskillstore
aiskillstore
data-ai
open
data-engineering
90

data-validator

Validate data against schemas, business rules, and data quality standards.

aiskillstore
aiskillstore
data-ai
open
data-engineering
83

data-module

myfy DataModule for database access with async SQLAlchemy. Use when working with DataModule, AsyncSession, database connections, connection pooling, migrations, or SQLAlchemy models.

psincraian
psincraian
data-ai
open
data-engineering
64

geoparquet

Convert spatial data (GeoJSON, Shapefile, etc.) to optimized GeoParquet using the gpio CLI. Analyzes files, recommends settings, and publishes to cloud storage.

geoparquet
geoparquet
data-ai
open
data-engineering
60

aws-sdk-java-v2-dynamodb

Amazon DynamoDB patterns using AWS SDK for Java 2.x. Use when creating, querying, scanning, or performing CRUD operations on DynamoDB tables, working with indexes, batch operations, transactions, or integrating with Spring Boot applications.

giuseppe-trisciuoglio
giuseppe-trisciuoglio
data-ai
open
data-engineering
59

python-data-classes

Use when Python data modeling with dataclasses, attrs, and Pydantic. Use when creating data structures and models.

TheBushidoCollective
TheBushidoCollective
data-ai
open
data-engineering
59

scala-collections

Use when scala collections including immutable/mutable variants, List, Vector, Set, Map operations, collection transformations, lazy evaluation with views, parallel collections, and custom collection builders for efficient data processing.

TheBushidoCollective
TheBushidoCollective
data-ai
open
data-engineering
59

java-streams-api

Use when Java Streams API for functional-style data processing. Use when processing collections with streams.

TheBushidoCollective
TheBushidoCollective
data-ai
open
data-engineering
59

ecto-changesets

Use when validating and casting data with Ecto changesets including field validation, constraints, nested changesets, and data transformation. Use for ensuring data integrity before database operations.

TheBushidoCollective
TheBushidoCollective
data-ai
open
data-engineering
50

sanity-schema-manager

Manages schema lifecycle including scaffolding, safe field deprecation, and data migrations. Use when modifying existing schemas.

sanity-io
sanity-io
data-ai
open
data-engineering
41

data-policies

This skill should be used when the user asks to "data policy", "dictionary", "field validation", "mandatory field", "read only", "data model", "table schema", "column attributes", or any ServiceNow Data Policy and Dictionary development.

groeimetai
groeimetai
data-ai
open
data-engineering
38

ebfe-validator

Validate organized eBFE/BLE model using ras-commander dataframes. Uses init_ras_project() then checks plan_df, boundary_df, rasmap_df to verify: - All plan files exist - All DSS files exist with relative paths - All terrain files exist with relative paths - All HDF results accessible - No absolute paths (would cause GUI popups) Use after organizing eBFE model to verify it's actually runnable. Generates validation report and script for user re-verification.

gpt-cmdr
gpt-cmdr
data-ai
open
data-engineering
38

analyzing-aorc-precipitation

Retrieves and processes AORC precipitation data for HEC-RAS/HMS models. Handles spatial averaging over watersheds, temporal aggregation, DSS export, and Atlas 14 design storms. Use when working with historical precipitation, AORC data, calibration workflows, design storm generation, rainfall analysis, SCS Type II distributions, AEP events, 100-year storms, or generating precipitation boundary conditions for rain-on-grid models. Triggers: precipitation, AORC, Atlas 14, design storm, rainfall, SCS Type II, AEP, 100-year, rain-on-grid, hyetograph, temporal distribution, areal reduction, calibration, historical precipitation.

gpt-cmdr
gpt-cmdr
data-ai
open
data-engineering
34

analyze-bigquery-usage

Comprehensive analysis of BigQuery usage patterns, costs, and query performance

openshift-eng
openshift-eng
data-ai
open
data-engineering
34

bigquery

Use bigquery CLI (instead of `bq`) for all Google BigQuery and GCP data warehouse operations including SQL query execution, data ingestion (streaming insert, bulk load, JSONL/CSV/Parquet), data extraction/export, dataset/table/view management, external tables, schema operations, query templates, cost estimation with dry-run, authentication with gcloud, data pipelines, ETL workflows, and MCP/LSP server integration for AI-assisted querying and editor support. Modern Rust-based replacement for the Python `bq` CLI with faster startup, better cost awareness, and streaming support. Handles both small-scale streaming inserts (<1000 rows) and large-scale bulk loading (>10MB files), with support for Cloud Storage integration.

lanej
lanej
data-ai
open
data-engineering
31

multi-source-data-merger

This skill provides guidance for merging data from multiple heterogeneous sources (CSV, JSON, Parquet, XML, etc.) into unified output formats with conflict detection and resolution. Use when tasks involve combining data from different file formats, field mapping between schemas, priority-based conflict resolution, or generating merged datasets with conflict reports.

letta-ai
letta-ai
data-ai
open
data-engineering
31

apache-airflow-orchestration

Complete guide for Apache Airflow orchestration including DAGs, operators, sensors, XComs, task dependencies, dynamic workflows, and production deployment

manutej
manutej
data-ai
open
Previous
Page 47 / 65
Next