lakehouse-patterns
Comprehensive guide to data lakehouse architecture combining data lake flexibility with data warehouse performance using Delta Lake, Iceberg, and Hudi
Comprehensive guide to data lakehouse architecture combining data lake flexibility with data warehouse performance using Delta Lake, Iceberg, and Hudi
Convert a complete Nextflow pipeline to Galaxy
Apply cleaning and filtering actions based on data quality decisions and generate filtered log artefacts.
Create standardized metadata for data assets. Use when documenting new datasets, building data catalogs, improving data discoverability, or creating data dictionaries for teams.
High-performance JSON and CSV parsing library for Clojure. Use when working with JSON or CSV data and need fast in Clojure, efficient parsing/writing with a clojure.data.json/clojure.data.csv compatible API.
Test and validate ClickHouse Cloud connection using clickhouse-connect for gapless-crypto-clickhouse. Use when validating connectivity, troubleshooting connection issues, or verifying environment configuration. Includes version check and query validation.
Guidance for working with Awkward Array 2.0 jagged arrays and records in Python. Use when building or debugging `awkward` workflows, including record construction with `ak.zip`, adding fields with `ak.with_field`, filtering/aggregation, combinatorics (`ak.cartesian`/`ak.combinations`), `argmin`/`argmax` slicing, flattening, sorting, and NumPy interop or common Awkward pitfalls.
Comprehensive guide to data quality validation, testing frameworks, anomaly detection, and data observability for production data pipelines
イミュータブルデータモデルに基づくデータモデリング自動化Skill。ブラックボードパターンで段階的にエンティティ抽出からER図生成まで実行します。
Persistent state management using AgentDB (DuckDB) for workflow analytics and checkpoints. Provides read-only analytics cache synchronized from TODO_*.md files, enabling: - Complex dependency graph queries - Historical workflow metrics - Context checkpoint storage/recovery - State transition analysis Use when: Data gathering and analysis for workflow state tracking Triggers: "analyze workflow", "query state", "checkpoint", "workflow metrics"
Implement and deserialize all CQL types including primitives (int, text, timestamp, uuid, varint, decimal), collections (list, set, map), tuples, UDTs (user-defined types), and frozen types. Use when working with CQL type deserialization, schema validation, collection parsing, UDT handling, or type-correct data generation.
Data Engineer Agent. ETL 파이프라인, 데이터 웨어하우스, 데이터 레이크 구축을 담당합니다.
GenStage, Broadway, and Flow for Elixir data pipelines
Query remote Parquet files via HTTP without downloading using DuckDB httpfs. Leverage column pruning, row filtering, and range requests for efficient bandwidth usage. Use for crypto/trading data distribution and analytics.
Use this agent when reviewing database migrations, schema changes, or data transformations. Specializes in validating ID mappings, checking for swapped values, and verifying rollback safety. Triggers on requests like "migration review", "schema change validation".
Field naming conventions for the Job Aggregator project. Use this skill when encountering type errors related to field names (camelCase vs snake_case), database constraint violations, or data mapping issues between Python/TypeScript/PostgreSQL.
Sync delta specs from a change to main specs. Use when the user wants to update main specs with changes from a delta spec, without archiving the change.
Implements high-performance streaming using System.IO.Pipelines in .NET. Use when building network protocols, parsing binary data, or processing large streams efficiently.
Auto-execute when "[MEMORY_KEEPER_DELTA]" trigger detected
Workflow for acquiring historical Ethereum blockchain data using Google BigQuery free tier. Empirically validated for cost estimation, streaming downloads, and DuckDB integration. Use when planning bulk historical data acquisition or comparing data source options for blockchain network metrics.
Ingest the event log, normalise schema, and generate an initial data profile with notebook and manifest updates.
Instructions for generating synthetic airline data with Synth CLI and loading it into SQLite.
Эксперт AWS Kinesis. Используй для stream processing, real-time data и Kinesis patterns.