home/categories/data-engineering

category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 स्किल्सall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-engineering

azure-storage-file-datalake-py

Azure Data Lake Storage Gen2 SDK for Python. Use for hierarchical file systems, big data analytics, and file/directory operations. Triggers: "data lake", "DataLakeServiceClient", "FileSystemClient", "ADLS Gen2", "hierarchical namespace".

microsoft

data-ai

open

data-engineering

azure-storage-queue-py

Azure Queue Storage SDK for Python. Use for reliable message queuing, task distribution, and asynchronous processing. Triggers: "queue storage", "QueueServiceClient", "QueueClient", "message queue", "dequeue".

microsoft

data-ai

open

data-engineering

drizzle-neon

Add a PostgreSQL database with Drizzle ORM to a Scaffold-ETH 2 project. Use when the user wants to: add a database, use Drizzle ORM, integrate Neon PostgreSQL, store off-chain data, build a backend with database, or add persistent storage to their dApp.

scaffold-eth

data-ai

open

data-engineering

bio-orchestrator

Meta-agent that routes bioinformatics requests to specialised sub-skills. Handles file type detection, analysis planning, report generation, and reproducibility export.

FreedomIntelligence

data-ai

open

data-engineering

seq-wrangler

Sequence QC, alignment, and BAM processing. Wraps FastQC, BWA/Bowtie2, SAMtools for automated read-to-BAM pipelines.

FreedomIntelligence

data-ai

open

data-engineering

data-table-filters

Install and extend data-table-filters — a React data table system with faceted filters (checkbox, input, slider, timerange), sorting, infinite scroll, virtualization, and BYOS state management. Delivered as 11 shadcn registry blocks installable via `npx shadcn@latest add`. Use when: (1) installing data-table-filters from the shadcn registry, (2) adding extension blocks (command palette, AI filters, cell renderers, sheet panel, store adapters, schema system, Drizzle helpers, query layer), (3) configuring store adapters (nuqs/zustand/memory), (4) generating table schemas from a data model, (5) wiring up server-side filtering with Drizzle ORM, (6) connecting the React Query fetch layer, (7) auto-inferring schemas from raw JSON data with DataTableAuto / inferSchemaFromJSON, (8) adding AI-powered natural language filtering, (9) exposing tables as MCP endpoints for AI agents, (10) troubleshooting integration issues. Triggers on mentions of "data-table-filters", "data-table-filters.com", filterable data tables with

openstatusHQ

data-ai

open

data-engineering

1.9K

preprocessing-data-with-automated-pipelines

Process automate data cleaning, transformation, and validation for ML tasks. Use when requesting "preprocess data", "clean data", "ETL pipeline", or "data transformation". Trigger with relevant phrases based on skill purpose.

jeremylongshore

data-ai

open

data-engineering

1.9K

data-quality-checker

Validate data quality checker operations. Auto-activating skill for Data Pipelines. Triggers on: data quality checker, data quality checker Part of the Data Pipelines skill category. Use when working with data quality checker functionality. Trigger with phrases like "data quality checker", "data checker", "data".

jeremylongshore

data-ai

open

data-engineering

1.9K

data-partitioner

Process data partitioner operations. Auto-activating skill for Data Pipelines. Triggers on: data partitioner, data partitioner Part of the Data Pipelines skill category. Use when working with data partitioner functionality. Trigger with phrases like "data partitioner", "data partitioner", "data".

jeremylongshore

data-ai

open

data-engineering

1.9K

schema-optimization-orchestrator

Multi-phase schema optimization workflow orchestrator. Creates session directories, spawns phase agents sequentially, validates outputs, aggregates results. Trigger: "run schema optimization", "optimize schema workflow", "execute schema phases"

jeremylongshore

data-ai

open

data-engineering

1.9K

data-augmentation-pipeline

Process data augmentation pipeline operations. Auto-activating skill for ML Training. Triggers on: data augmentation pipeline, data augmentation pipeline Part of the ML Training skill category. Use when working with data augmentation pipeline functionality. Trigger with phrases like "data augmentation pipeline", "data pipeline", "data".

jeremylongshore

data-ai

open

data-engineering

1.9K

deepgram-data-handling

Implement audio data handling best practices for Deepgram integrations. Use when managing audio file storage, implementing data retention, or ensuring GDPR/HIPAA compliance for transcription data. Trigger: "deepgram data", "audio storage", "transcription data", "deepgram GDPR", "deepgram HIPAA", "deepgram privacy", "PII redaction".

jeremylongshore

data-ai

open

data-engineering

1.9K

deepgram-migration-deep-dive

Deep dive into migrating to Deepgram from other transcription providers. Use when migrating from AWS Transcribe, Google Cloud STT, Azure Speech, OpenAI Whisper, AssemblyAI, or Rev.ai to Deepgram. Trigger: "deepgram migration", "switch to deepgram", "migrate transcription", "deepgram from AWS", "deepgram from Google", "replace whisper with deepgram".

jeremylongshore

data-ai

open

data-engineering

1.9K

navan-data-sync

Implement incremental sync strategies for Navan BOOKING and TRANSACTION data with ETL pipeline patterns. Use when setting up production data pipelines, debugging sync drift, or adding real-time event processing. Trigger with "navan data sync", "navan incremental sync", "navan ETL pipeline".

jeremylongshore

data-ai

open

data-engineering

1.9K

openevidence-migration-deep-dive

Migration Deep Dive for OpenEvidence. Trigger: "openevidence migration deep dive".

jeremylongshore

data-ai

open

data-engineering

1.9K

brightdata-deploy-integration

Deploy Bright Data integrations to Vercel, Fly.io, and Cloud Run platforms. Use when deploying Bright Data-powered applications to production, configuring platform-specific secrets, or setting up deployment pipelines. Trigger with phrases like "deploy brightdata", "brightdata Vercel", "brightdata production deploy", "brightdata Cloud Run", "brightdata Fly.io".

jeremylongshore

data-ai

open

data-engineering

1.9K

databricks-core-workflow-a

Execute Databricks primary workflow: Delta Lake ETL pipelines. Use when building data ingestion pipelines, implementing medallion architecture, or creating Delta Lake transformations. Trigger with phrases like "databricks ETL", "delta lake pipeline", "medallion architecture", "databricks data pipeline", "bronze silver gold".

jeremylongshore

data-ai

open

data-engineering

1.9K

databricks-data-handling

Implement Delta Lake data management patterns including GDPR, PII handling, and data lifecycle. Use when implementing data retention, handling GDPR requests, or managing data lifecycle in Delta Lake. Trigger with phrases like "databricks GDPR", "databricks PII", "databricks data retention", "databricks data lifecycle", "delete user data".

jeremylongshore

data-ai

open

data-engineering

1.9K

feishu-evolver-wrapper

Feishu-integrated wrapper for the capability-evolver. Manages the evolution loop lifecycle (start/stop/ensure), sends rich Feishu card reports, and provides dashboard visualization. Use when running evolver with Feishu reporting or when managing the evolution daemon.

LeoYeAI

data-ai

open

data-engineering

1.9K

senior-data-engineer

Data engineering skill for building scalable data pipelines, ETL/ELT systems, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka, and modern data stack. Includes data modeling, pipeline orchestration, data quality, and DataOps. Use when designing data architectures, building data pipelines, optimizing data workflows, implementing data governance, or troubleshooting data issues.

LeoYeAI

data-ai

open

data-engineering

1.8K

sdd-onboard

Guided end-to-end walkthrough of the SDD workflow using the real codebase. Trigger: When the orchestrator launches you to onboard a user through the full SDD cycle.

Gentleman-Programming

data-ai

open

data-engineering

1.8K

team-arch-opt

Unified team skill for architecture optimization. Uses team-worker agent architecture with role directories for domain logic. Coordinator orchestrates pipeline, workers are team-worker agents. Triggers on "team arch-opt".

catlog22

data-ai

open

data-engineering

1.8K

team-perf-opt

Unified team skill for performance optimization. Coordinator orchestrates pipeline, workers are team-worker agents. Supports single/fan-out/independent parallel modes. Triggers on "team perf-opt".

catlog22

data-ai

open

data-engineering

1.8K

workflow-tdd-plan

Chain-loaded TDD planning. 7-phase with Red-Green-Refactor task generation.

catlog22

data-ai

open

Page 11 / 65