home/categories/data-engineering

category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 skillsall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-engineering

770

intelligent-cache

Multi-layer caching with type-specific TTLs, get-or-generate pattern, memory and database layers, and graceful invalidation without cache stampede.

dadbodgeoff

data-ai

open

data-engineering

770

snapshot-aggregation

Daily compression of time-series data with merge logic for multiple pipeline runs, structured aggregation for dashboards, and storage estimation for capacity planning.

dadbodgeoff

data-ai

open

data-engineering

770

validation-quarantine

Data validation with quality scoring and quarantine for suspicious records. Validates incoming data without blocking the pipeline, enabling manual review of edge cases.

dadbodgeoff

data-ai

open

data-engineering

759

bgee-skill

Submit compact Bgee SPARQL requests for healthy wild-type expression metadata and ontology-aware lookup patterns. Use when a user wants concise Bgee summaries; save raw results only on request.

openai

data-ai

open

data-engineering

759

Vercel Queues guidance (public beta) — durable event streaming with topics, consumer groups, retries, and delayed delivery. $0.60/1M ops. Powers Workflow DevKit. Use when building async processing, fan-out patterns, or event-driven architectures.

openai

data-ai

open

data-engineering

753

edge-candidate-agent

Generate and prioritize US equity long-side edge research tickets from EOD observations, then export pipeline-ready candidate specs for trade-strategy-pipeline Phase I. Use when users ask to turn hypotheses/anomalies into reproducible research tickets, convert validated ideas into `strategy.yaml` + `metadata.json`, or preflight-check interface compatibility (`edge-finder-candidate/v1`) before running pipeline backtests.

tradermonty

data-ai

open

data-engineering

753

edge-pipeline-orchestrator

Orchestrate the full edge research pipeline from candidate detection through strategy design, review, revision, and export. Use when coordinating multi-stage edge research workflows end-to-end.

tradermonty

data-ai

open

data-engineering

753

edge-strategy-reviewer

Critically review strategy drafts from edge-strategy-designer for edge plausibility, overfitting risk, sample size adequacy, and execution realism. Use when strategy_drafts/*.yaml exists and needs quality gate before pipeline export. Outputs PASS/REVISE/REJECT verdicts with confidence scores.

tradermonty

data-ai

open

data-engineering

750

create-data-source

Create a new Earth2Studio data source wrapper (DataSource, ForecastSource, DataFrameSource, or ForecastFrameSource) from a remote data store. Use this skill whenever the user mentions adding a data source, weather data API, observation feed, forecast archive, or any new remote/cloud data integration to Earth2Studio. Also trigger when the user asks about implementing DataSource, ForecastSource, DataFrameSource, or ForecastFrameSource protocols, connecting to S3/GCS/Azure/FTP stores, or wrapping a new weather/climate dataset.

NVIDIA

data-ai

open

data-engineering

710

d1-drizzle-schema

Generate Drizzle ORM schemas for Cloudflare D1 databases with correct D1-specific patterns. Produces schema files, migration commands, type exports, and DATABASE_SCHEMA.md documentation. Handles D1 quirks: foreign keys always enforced, no native BOOLEAN/DATETIME types, 100 bound parameter limit, JSON stored as TEXT. Use when creating a new database, adding tables, or scaffolding a D1 data layer.

jezweb

data-ai

open

data-engineering

710

d1-migration

Cloudflare D1 migration workflow: generate with Drizzle, inspect SQL for gotchas, apply to local and remote, fix stuck migrations, handle partial failures. Use when running migrations, fixing migration errors, or setting up D1 schemas.

jezweb

data-ai

open

data-engineering

710

db-seed

Generate database seed scripts with realistic sample data. Reads Drizzle schemas or SQL migrations, respects foreign key ordering, produces idempotent TypeScript or SQL seed files. Handles D1 batch limits, unique constraints, and domain-appropriate data. Use when populating dev/demo/test databases. Triggers: 'seed database', 'seed data', 'sample data', 'populate database', 'db seed', 'test data', 'demo data', 'generate fixtures'.

jezweb

data-ai

open

data-engineering

707

databricks-core-workflow-a

Execute Databricks primary workflow: Delta Lake ETL pipelines. Use when building data ingestion pipelines, implementing medallion architecture, or creating Delta Lake transformations. Trigger with phrases like "databricks ETL", "delta lake pipeline", "medallion architecture", "databricks data pipeline", "bronze silver gold".

Dicklesworthstone

data-ai

open

data-engineering

707

databricks-data-handling

Implement Delta Lake data management patterns including GDPR, PII handling, and data lifecycle. Use when implementing data retention, handling GDPR requests, or managing data lifecycle in Delta Lake. Trigger with phrases like "databricks GDPR", "databricks PII", "databricks data retention", "databricks data lifecycle", "delete user data".

Dicklesworthstone

data-ai

open

data-engineering

707

openevidence-migration-deep-dive

Execute complex OpenEvidence migrations including EHR integration, data migration, and system transitions. Use when migrating from legacy clinical decision support systems, integrating with new EHRs, or performing major platform transitions. Trigger with phrases like "openevidence migration", "ehr integration", "migrate to openevidence", "clinical ai migration", "legacy cds migration".

Dicklesworthstone

data-ai

open

data-engineering

687

bio-orchestrator

Meta-agent that routes bioinformatics requests to specialised sub-skills. Handles file type detection, analysis planning, report generation, and reproducibility export.

ClawBio

data-ai

open

data-engineering

687

proteomics-de

Differential expression analysis for label-free quantitative (LFQ) intensity data with standard MaxQuant and DIA-NN output. Workflow includes preprocessing, imputation, and statistical testing.

ClawBio

data-ai

open

data-engineering

687

seq-wrangler

Sequence QC, alignment, and BAM processing. Wraps FastQC, BWA/Bowtie2, SAMtools for automated read-to-BAM pipelines.

ClawBio

data-ai

open

data-engineering

660

eino-compose

Eino orchestration with Graph, Chain, and Workflow. Use when a user needs to build multi-step pipelines, compose components into executable graphs, handle streaming between nodes, use branching or parallel execution, manage state with checkpoints, or understand the Runnable abstraction. Covers Graph (directed graph with cycles), Chain (linear sequential), and Workflow (DAG with field mapping).

cloudwego

data-ai

open

data-engineering

645

team-orchestrator

Agent Teams 오케스트레이션 엔진 - 팀 구성, 작업 분배, 의존성 관리, 결과 집계

sangrokjung

data-ai

open

data-engineering

642

metadata-handling

Metadata Handling

kreuzberg-dev

data-ai

open

data-engineering

637

wren-connection-info

Reference guide for Wren Engine connection info — explains required fields for all 18 supported data sources (PostgreSQL, MySQL, BigQuery, Snowflake, ClickHouse, Trino, DuckDB, Databricks, Spark, Athena, Redshift, Oracle, SQL Server, Apache Doris, S3, GCS, MinIO, local files). Covers sensitive field handling, Docker host hints, and BigQuery credential encoding. Use when the user asks how to configure a data source connection or what fields to fill in.

Canner

data-ai

open

data-engineering

637

wren-dlt-connector

Connect SaaS data (HubSpot, Stripe, Salesforce, GitHub, Slack, etc.) to Wren Engine for SQL analysis. Guides the user through the full flow: install dlt, pick a SaaS source, set up credentials, run the data pipeline into DuckDB, then auto-generate a Wren semantic project from the loaded data. Use this skill whenever the user mentions: connecting SaaS data, importing data from an API, dlt pipelines, loading HubSpot/Stripe/Salesforce/GitHub/Slack data, querying SaaS data with SQL, or setting up a new data source from a REST API. Also trigger when the user already has a dlt-produced DuckDB file and wants to create a Wren project from it.

Canner

data-ai

open

data-engineering

634

dbt-transformation-patterns

Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.

rmyndharis

data-ai

open

Page 17 / 65