home/categories/data-engineering

category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 skillsall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-engineering

data-engineer

Build ETL pipelines, data warehouses, and streaming architectures. Implements Spark jobs, Airflow DAGs, and Kafka streams. Use PROACTIVELY for data pipeline design or analytics infrastructure.

sidetoolco

data-ai

open

data-engineering

graphql-resolvers

Write efficient resolvers with DataLoader, batching, and N+1 prevention

pluginagentmarketplace

data-ai

open

data-engineering

Filter and search event datasets (logs) using OPAL. Use when you need to find specific log events by text search, regex patterns, or field values. Covers contains(), tilda operator ~, field comparisons, boolean logic, and limit for sampling results. Does NOT cover aggregation (see aggregating-event-datasets skill).

rustomax

data-ai

open

data-engineering

python-data-engineering

Comprehensive Python data engineering patterns for AWS Data Lake, including PySpark, Pandas, Apache Airflow, AWS Glue, ETL pipelines, data quality, schema management, performance optimization, FastAPI services, streaming with Kafka/Kinesis, data validation with Great Expectations, testing strategies, error handling, logging, and production deployment on AWS EMR and Glue.

b3-competition

data-ai

open

data-engineering

gcp-bq-data-loading

Use when loading data into BigQuery from CSV, JSON, Avro, Parquet files, Cloud Storage, or local files. Covers bq load command, source formats, schema detection, incremental loading, and handling parsing errors.

FunnelEnvy

data-ai

open

data-engineering

erpnext-errors-database

Error handling patterns for ERPNext/Frappe database operations. Use when handling DoesNotExistError, DuplicateEntryError, transaction failures, and query errors. Covers retry patterns and data integrity. V14/V15/V16 compatible. Triggers: database error, DoesNotExistError, DuplicateEntryError, transaction failed, query error.

OpenAEC-Foundation

data-ai

open

data-engineering

nixtla-contract-schema-mapper

Transforms prediction market data to Nixtla format (unique_id, ds, y). Maps arbitrary column names to required schema. Validates date and numeric types. Use when preparing prediction market datasets for Nixtla forecasting tools. Trigger with "convert to Nixtla format", "schema mapping", "transform data".

intent-solutions-io

data-ai

open

data-engineering

blockchain-data-collection-validation

Empirical validation workflow for blockchain data collection pipelines before production implementation. Use when validating data sources, testing DuckDB integration, building POC collectors, or verifying complete fetch-to-storage pipelines for blockchain data.

terrylica

data-ai

open

data-engineering

data-analysis

Data analysis workflows and patterns for exploring, transforming, and visualizing data. Use when working with data, creating reports, or when users mention "data analysis", "analyze data", "data exploration", or "reporting".

IHKREDDY

data-ai

open

data-engineering

data-analysis

Executive-grade data analysis with pandas/polars and McKinsey-quality visualizations. Use when analyzing data, building dashboards, creating investor presentations, or calculating SaaS metrics.

ScientiaCapital

data-ai

open

data-engineering

running-eda-process

Runs Exploratory Data Analysis (EDA) following the mandatory validation workflow. Use when performing data analysis, exploring datasets, validating data quality, or when the user mentions EDA, data exploration, sanity checks, or data validation. Always run before main analysis queries.

nimrodfisher

data-ai

open

data-engineering

bpa-rules

This skill should be used when the user asks to "create a BPA rule", "write a Best Practice Analyzer rule", "improve a BPA expression", "fix expression for BPA", "analyze BPA annotations", "check model for best practices", "audit BPA rules", "discover BPA rules", "list all BPA rules", "validate BPA rules", or mentions Tabular Editor BPA rules. Provides guidance for creating, improving, auditing, and understanding Best Practice Analyzer rules for Power BI semantic models.

data-goblin

data-ai

open

data-engineering

freshness-latency-slos

See the main Data Freshness and Latency skill for comprehensive coverage of freshness monitoring and SLO tracking.

AmnadTaowsoam

data-ai

open

data-engineering

data-quality-monitoring

Techniques and tools for ensuring the accuracy, completeness, and reliability of data across the pipeline.

AmnadTaowsoam

data-ai

open

data-engineering

data-lineage

Mapping the flow of data from source to destination for transparency, impact analysis, and troubleshooting.

AmnadTaowsoam

data-ai

open

data-engineering

process-mining-assistant

Perform an end-to-end process mining analysis via a command-line workflow that progressively ingests, profiles, cleans, mines and reports on event logs using PM4Py. The workflow generates stage-based artefacts (including versioned notebooks) and pauses at decision checkpoints so the user can validate findings and choose how to proceed.

Wattysaid

data-ai

open

data-engineering

data-freshness-and-latency

Monitoring and optimizing how quickly data flows through pipelines and ensuring it meets timeliness requirements.

AmnadTaowsoam

data-ai

open

data-engineering

polars

Expert guidance for Polars dataframe manipulation in Python. Use this skill when working with dataframes, data processing, ETL pipelines, or any task involving tabular data manipulation. Provides best practices, performance optimization patterns, and comprehensive API usage for the Polars library.

iKiok

data-ai

open

data-engineering

duckdb-data-explorer

This skill should be used when performing local data exploration, profiling, quality analysis, or transformation tasks using DuckDB. It handles CSV, Parquet, and JSON files, provides automated data quality reports, supports complex JSON transformations, and generates interactive HTML reports for data analysis.

alexismanuel

data-ai

open

data-engineering

dataql-analysis

Analyze data files using SQL queries with DataQL. Use when working with CSV, JSON, Parquet, Excel files or when the user mentions data analysis, filtering, aggregation, or SQL queries on files.

adrianolaselva

data-ai

open

data-engineering

eda

Exploratory Data Analysis for tabular data. Use this skill when analyzing value distributions, checking for missing data, computing correlations, examining class balance, or generating data quality reports.

argythana

data-ai

open

data-engineering

knack-data-cleaner

Ensures accuracy for HTI compliance and performance dashboards through data validation, deduplication, normalization, and integrity checks. Critica...

willsigmon

data-ai

open

data-engineering

polars

Use when "Polars", "fast dataframe", "lazy evaluation", "Arrow backend", or asking about "pandas alternative", "parallel dataframe", "large CSV processing", "ETL pipeline", "expression API"

eyadsibai

data-ai

open

data-engineering

dc-query-building

Build semantic queries with measures, dimensions, filters, and time dimensions for Drizzle Cube.

cliftonc

data-ai

open

Page 57 / 65