adf-master
Comprehensive Azure Data Factory knowledge base with official documentation sources, CI/CD methods, deployment patterns, and troubleshooting resources
Comprehensive Azure Data Factory knowledge base with official documentation sources, CI/CD methods, deployment patterns, and troubleshooting resources
Ecto patterns for Phoenix/Elixir apps. Covers schemas, changesets, migrations, queries, Ecto.Multi, transactions, constraints, associations, pagination, tenant partitioning, performance, and testing.
Understand MIMIC-IV table relationships, join patterns, and identifier hierarchy. Use for correct data linkage, avoiding duplicates, and proper temporal joins.
Build and validate task DAGs for Ralph parallel execution. Use when planning execution order, detecting cycles, or explaining why tasks are blocked. Triggers on: plan dag, check dependencies, why is task blocked, execution order.
Run Earthly E2E validation for YAML schema contracts. Use when validating YAML schema changes, testing schema contracts against live ClickHouse, or regenerating Python types, DDL, and docs from YAML. For SQL schema design and optimization, use clickhouse-architect skill instead.
Create a Databricks lakehouse pipeline design doc (bronze/silver/gold, DLT or Jobs), including SLAs, data quality, Unity Catalog governance, monitoring, and an implementation checklist. Use when designing or reviewing ETL/ELT pipelines, DLT pipelines, streaming ingestion, CDC, or batch jobs on Databricks.
Microsoft Fabric Lakehouse, OneLake, and Fabric Warehouse connectors for Azure Data Factory (2025)
Databricks Expert Engineer Skill - Comprehensive guide for data engineering, machine learning infrastructure, and permission design
Expert in data pipelines, ETL processes, and data infrastructure
Expert documentation generation for golden layers. Detects SCD types, documents business rules, metric definitions, aggregation logic, and data quality scoring. Use when documenting golden layer tables.
Expert documentation generation for CDP Master Segment (Parent Segment) configurations. Analyzes master segment tables using TD MCP, extracts attribute and behavior schemas, documents star schema relationships, and creates comprehensive segment documentation. Use when documenting CDP parent segments.
Generate realistic synthetic data using Faker and Spark, with non-linear distributions, integrity constraints, and save to Databricks. Use when creating test data, demo datasets, or synthetic tables.
Displaying charts, dataframes, and metrics in Streamlit. Use when visualizing data, configuring dataframe columns, or adding sparklines to metrics. Covers native charts, Altair, and column configuration.
Generate and compare ydata-profiling EDA reports with sampling, consistent random seeds, and HTML outputs; often follows duckdb-parquet-lab-workflow when data is queried from Parquet.
Manages CDP parent segments using `tdx ps` commands with YAML configs. Covers master tables, attributes, behaviors, `tdx ps validate` for join validation, `tdx ps preview` for data preview, and schedule configuration (daily/hourly/cron). Use when creating customer master tables, validating join match rates, or troubleshooting parent segment workflows.
Analyzes impact of proposed changes on existing systems (brownfield projects) with delta spec validation. Trigger terms: change impact, impact analysis, brownfield, delta spec, change proposal, change management, existing system analysis, integration impact, breaking changes, dependency analysis, affected components, migration plan, risk assessment, brownfield change. Provides comprehensive change analysis for existing systems: - Affected component identification - Breaking change detection - Dependency graph updates - Integration point impact - Database migration analysis - API compatibility checks - Risk assessment and mitigation strategies - Migration plan recommendations Use when: proposing changes to existing systems, analyzing brownfield integration, or validating delta specifications.
Assess data quality with checks for missing values, duplicates, type issues, and inconsistencies. Use for data validation, ETL pipelines, or dataset documentation.
Validates CDP segment YAML configurations against the TD CDP API specification. Use when reviewing segment rules for correctness, checking operator types and values, or troubleshooting segment configuration errors before pushing to Treasure Data.
Design Kafka topics, partitions, consumer groups, producers with idempotency, retry strategies, dead letter queues, exactly-once semantics, and schema registry integration
Create and configure Databricks Asset Bundles (DABs) with best practices for multi-environment deployments. Use when working with: (1) Creating new DAB projects, (2) Adding resources (dashboards, pipelines, jobs, alerts), (3) Configuring multi-environment deployments, (4) Setting up permissions, (5) Deploying or running bundle resources
Comprehensive guide for implementing Supabase Realtime features with best practices, scalable patterns, and migration strategies. Use when building realtime features in Supabase applications including messaging, notifications, presence, live updates, collaborative features, or migrating from postgres_changes to broadcast. Covers client setup, database triggers with realtime.broadcast_changes, RLS authorization, naming conventions, and performance optimization.
Validates CDP journey YAML configurations against tdx schema requirements. Use when reviewing journey structure, checking step types and parameters, verifying segment references, or troubleshooting journey configuration errors before pushing to Treasure Data.
Creates, configures, and updates Databricks Lakeflow Spark Declarative Pipelines (SDP/LDP) using serverless compute. Handles streaming tables, materialized views, CDC, SCD Type 2, and Auto Loader ingestion patterns. Use when building data pipelines, working with Delta Live Tables, ingesting streaming data, implementing change data capture, or when the user mentions SDP, LDP, DLT, Lakeflow pipelines, streaming tables, or bronze/silver/gold medallion architectures.