creating-bauplan-pipelines
Creates bauplan data pipeline projects with SQL and Python models. Use when starting a new pipeline, defining DAG transformations, writing models, or setting up bauplan project structure from scratch.
Creates bauplan data pipeline projects with SQL and Python models. Use when starting a new pipeline, defining DAG transformations, writing models, or setting up bauplan project structure from scratch.
Safely extend or refine AFI signal schemas and closely-related validators in afi-core, while preserving determinism, respecting PoI/PoInsight design, and obeying the AFI Droid Charter and AFI Core AGENTS.md boundaries.
Optimize Apache Spark jobs with partitioning, caching, shuffle optimization, and memory tuning. Use when improving Spark performance, debugging slow jobs, or scaling data processing pipelines.
Chain agent outputs as inputs in sequential or parallel pipelines for data flow orchestration
Process and transform arrays of data with common operations like filtering, mapping, and aggregation
Validate data against schemas, business rules, and data quality standards.
myfy DataModule for database access with async SQLAlchemy. Use when working with DataModule, AsyncSession, database connections, connection pooling, migrations, or SQLAlchemy models.
Convert spatial data (GeoJSON, Shapefile, etc.) to optimized GeoParquet using the gpio CLI. Analyzes files, recommends settings, and publishes to cloud storage.
Amazon DynamoDB patterns using AWS SDK for Java 2.x. Use when creating, querying, scanning, or performing CRUD operations on DynamoDB tables, working with indexes, batch operations, transactions, or integrating with Spring Boot applications.
Create efficient data pipelines with tf.data
Use when Python data modeling with dataclasses, attrs, and Pydantic. Use when creating data structures and models.
Use when scala collections including immutable/mutable variants, List, Vector, Set, Map operations, collection transformations, lazy evaluation with views, parallel collections, and custom collection builders for efficient data processing.
Use when Java Streams API for functional-style data processing. Use when processing collections with streams.
Use when validating and casting data with Ecto changesets including field validation, constraints, nested changesets, and data transformation. Use for ensuring data integrity before database operations.
Manages schema lifecycle including scaffolding, safe field deprecation, and data migrations. Use when modifying existing schemas.
Migrates mock data from Drizzle ORM schemas to C# MockData classes
Creates JPA entities following best practices.
This skill should be used when the user asks to "data policy", "dictionary", "field validation", "mandatory field", "read only", "data model", "table schema", "column attributes", or any ServiceNow Data Policy and Dictionary development.
Validate organized eBFE/BLE model using ras-commander dataframes. Uses init_ras_project() then checks plan_df, boundary_df, rasmap_df to verify: - All plan files exist - All DSS files exist with relative paths - All terrain files exist with relative paths - All HDF results accessible - No absolute paths (would cause GUI popups) Use after organizing eBFE model to verify it's actually runnable. Generates validation report and script for user re-verification.
Retrieves and processes AORC precipitation data for HEC-RAS/HMS models. Handles spatial averaging over watersheds, temporal aggregation, DSS export, and Atlas 14 design storms. Use when working with historical precipitation, AORC data, calibration workflows, design storm generation, rainfall analysis, SCS Type II distributions, AEP events, 100-year storms, or generating precipitation boundary conditions for rain-on-grid models. Triggers: precipitation, AORC, Atlas 14, design storm, rainfall, SCS Type II, AEP, 100-year, rain-on-grid, hyetograph, temporal distribution, areal reduction, calibration, historical precipitation.
Comprehensive analysis of BigQuery usage patterns, costs, and query performance
Use bigquery CLI (instead of `bq`) for all Google BigQuery and GCP data warehouse operations including SQL query execution, data ingestion (streaming insert, bulk load, JSONL/CSV/Parquet), data extraction/export, dataset/table/view management, external tables, schema operations, query templates, cost estimation with dry-run, authentication with gcloud, data pipelines, ETL workflows, and MCP/LSP server integration for AI-assisted querying and editor support. Modern Rust-based replacement for the Python `bq` CLI with faster startup, better cost awareness, and streaming support. Handles both small-scale streaming inserts (<1000 rows) and large-scale bulk loading (>10MB files), with support for Cloud Storage integration.
This skill provides guidance for merging data from multiple heterogeneous sources (CSV, JSON, Parquet, XML, etc.) into unified output formats with conflict detection and resolution. Use when tasks involve combining data from different file formats, field mapping between schemas, priority-based conflict resolution, or generating merged datasets with conflict reports.
Complete guide for Apache Airflow orchestration including DAGs, operators, sensors, XComs, task dependencies, dynamic workflows, and production deployment