home/categories/data-engineering
category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 स्किल्सall categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
data-engineering
7

adf-master

Comprehensive Azure Data Factory knowledge base with official documentation sources, CI/CD methods, deployment patterns, and troubleshooting resources

JosiahSiegel
JosiahSiegel
data-ai
open
data-engineering
7

oracle

Use the @steipete/oracle CLI to bundle a prompt plus the right files and get a second-model review (API or browser) for debugging, refactors, design checks, or cross-validation.

gmickel
gmickel
data-ai
open
data-engineering
7

ecto-patterns

Ecto patterns for Phoenix/Elixir apps. Covers schemas, changesets, migrations, queries, Ecto.Multi, transactions, constraints, associations, pagination, tenant partitioning, performance, and testing.

bobmatnyc
bobmatnyc
data-ai
open
data-engineering
7

mimic-table-relationships

Understand MIMIC-IV table relationships, join patterns, and identifier hierarchy. Use for correct data linkage, avoiding duplicates, and proper temporal joins.

hannesill
hannesill
data-ai
open
data-engineering
7

dag-planner

Build and validate task DAGs for Ralph parallel execution. Use when planning execution order, detecting cycles, or explaining why tasks are blocked. Triggers on: plan dag, check dependencies, why is task blocked, execution order.

frizynn
frizynn
data-ai
open
data-engineering
7

schema-e2e-validation

Run Earthly E2E validation for YAML schema contracts. Use when validating YAML schema changes, testing schema contracts against live ClickHouse, or regenerating Python types, DDL, and docs from YAML. For SQL schema design and optimization, use clickhouse-architect skill instead.

terrylica
terrylica
data-ai
open
data-engineering
7

lakehouse-pipeline-design

Create a Databricks lakehouse pipeline design doc (bronze/silver/gold, DLT or Jobs), including SLAs, data quality, Unity Catalog governance, monitoring, and an implementation checklist. Use when designing or reviewing ETL/ELT pipelines, DLT pipelines, streaming ingestion, CDC, or batch jobs on Databricks.

hubert-dudek
hubert-dudek
data-ai
open
data-engineering
7

fabric-onelake-2025

Microsoft Fabric Lakehouse, OneLake, and Fabric Warehouse connectors for Azure Data Factory (2025)

JosiahSiegel
JosiahSiegel
data-ai
open
data-engineering
6

databricks

Databricks Expert Engineer Skill - Comprehensive guide for data engineering, machine learning infrastructure, and permission design

i9wa4
i9wa4
data-ai
open
data-engineering
6

data-engineer

Expert in data pipelines, ETL processes, and data infrastructure

daffy0208
daffy0208
data-ai
open
data-engineering
5

aps-doc-golden

Expert documentation generation for golden layers. Detects SCD types, documents business rules, metric definitions, aggregation logic, and data quality scoring. Use when documenting golden layer tables.

treasure-data
treasure-data
data-ai
open
data-engineering
5

aps-doc-master-segment

Expert documentation generation for CDP Master Segment (Parent Segment) configurations. Analyzes master segment tables using TD MCP, extracts attribute and behavior schemas, documents star schema relationships, and creates comprehensive segment documentation. Use when documenting CDP parent segments.

treasure-data
treasure-data
data-ai
open
data-engineering
5

synthetic-data-generation

Generate realistic synthetic data using Faker and Spark, with non-linear distributions, integrity constraints, and save to Databricks. Use when creating test data, demo datasets, or synthetic tables.

databricks-solutions
databricks-solutions
data-ai
open
data-engineering
5

displaying-streamlit-data

Displaying charts, dataframes, and metrics in Streamlit. Use when visualizing data, configuring dataframe columns, or adding sparklines to metrics. Covers native charts, Altair, and column configuration.

streamlit
streamlit
data-ai
open
data-engineering
5

ydata-eda-profiling

Generate and compare ydata-profiling EDA reports with sampling, consistent random seeds, and HTML outputs; often follows duckdb-parquet-lab-workflow when data is queried from Parquet.

crossxwill
crossxwill
data-ai
open
data-engineering
5

parent-segment

Manages CDP parent segments using `tdx ps` commands with YAML configs. Covers master tables, attributes, behaviors, `tdx ps validate` for join validation, `tdx ps preview` for data preview, and schedule configuration (daily/hourly/cron). Use when creating customer master tables, validating join match rates, or troubleshooting parent segment workflows.

treasure-data
treasure-data
data-ai
open
data-engineering
5

change-impact-analyzer

Analyzes impact of proposed changes on existing systems (brownfield projects) with delta spec validation. Trigger terms: change impact, impact analysis, brownfield, delta spec, change proposal, change management, existing system analysis, integration impact, breaking changes, dependency analysis, affected components, migration plan, risk assessment, brownfield change. Provides comprehensive change analysis for existing systems: - Affected component identification - Breaking change detection - Dependency graph updates - Integration point impact - Database migration analysis - API compatibility checks - Risk assessment and mitigation strategies - Migration plan recommendations Use when: proposing changes to existing systems, analyzing brownfield integration, or validating delta specifications.

nahisaho
nahisaho
data-ai
open
data-engineering
5

data-quality-auditor

Assess data quality with checks for missing values, duplicates, type issues, and inconsistencies. Use for data validation, ETL pipelines, or dataset documentation.

dkyazzentwatwa
dkyazzentwatwa
data-ai
open
data-engineering
5

validate-segment

Validates CDP segment YAML configurations against the TD CDP API specification. Use when reviewing segment rules for correctness, checking operator types and values, or troubleshooting segment configuration errors before pushing to Treasure Data.

treasure-data
treasure-data
data-ai
open
data-engineering
5

kafka-stream-designer

Design Kafka topics, partitions, consumer groups, producers with idempotency, retry strategies, dead letter queues, exactly-once semantics, and schema registry integration

phatpham9
phatpham9
data-ai
open
data-engineering
5

asset-bundles

Create and configure Databricks Asset Bundles (DABs) with best practices for multi-environment deployments. Use when working with: (1) Creating new DAB projects, (2) Adding resources (dashboards, pipelines, jobs, alerts), (3) Configuring multi-environment deployments, (4) Setting up permissions, (5) Deploying or running bundle resources

databricks-solutions
databricks-solutions
data-ai
open
data-engineering
5

supabase-realtime

Comprehensive guide for implementing Supabase Realtime features with best practices, scalable patterns, and migration strategies. Use when building realtime features in Supabase applications including messaging, notifications, presence, live updates, collaborative features, or migrating from postgres_changes to broadcast. Covers client setup, database triggers with realtime.broadcast_changes, RLS authorization, naming conventions, and performance optimization.

Raudbjorn
Raudbjorn
data-ai
open
data-engineering
5

validate-journey

Validates CDP journey YAML configurations against tdx schema requirements. Use when reviewing journey structure, checking step types and parameters, verifying segment references, or troubleshooting journey configuration errors before pushing to Treasure Data.

treasure-data
treasure-data
data-ai
open
data-engineering
5

spark-declarative-pipelines

Creates, configures, and updates Databricks Lakeflow Spark Declarative Pipelines (SDP/LDP) using serverless compute. Handles streaming tables, materialized views, CDC, SCD Type 2, and Auto Loader ingestion patterns. Use when building data pipelines, working with Delta Live Tables, ingesting streaming data, implementing change data capture, or when the user mentions SDP, LDP, DLT, Lakeflow pipelines, streaming tables, or bronze/silver/gold medallion architectures.

databricks-solutions
databricks-solutions
data-ai
open
Previous
Page 51 / 65
Next