home/categories/data-engineering
category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 स्किल्सall categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
data-engineering
2K

azure-storage-file-datalake-py

Azure Data Lake Storage Gen2 SDK for Python. Use for hierarchical file systems, big data analytics, and file/directory operations. Triggers: "data lake", "DataLakeServiceClient", "FileSystemClient", "ADLS Gen2", "hierarchical namespace".

microsoft
microsoft
data-ai
open
data-engineering
2K

azure-storage-queue-py

Azure Queue Storage SDK for Python. Use for reliable message queuing, task distribution, and asynchronous processing. Triggers: "queue storage", "QueueServiceClient", "QueueClient", "message queue", "dequeue".

microsoft
microsoft
data-ai
open
data-engineering
2K

drizzle-neon

Add a PostgreSQL database with Drizzle ORM to a Scaffold-ETH 2 project. Use when the user wants to: add a database, use Drizzle ORM, integrate Neon PostgreSQL, store off-chain data, build a backend with database, or add persistent storage to their dApp.

scaffold-eth
scaffold-eth
data-ai
open
data-engineering
2K

bio-orchestrator

Meta-agent that routes bioinformatics requests to specialised sub-skills. Handles file type detection, analysis planning, report generation, and reproducibility export.

FreedomIntelligence
FreedomIntelligence
data-ai
open
data-engineering
2K

seq-wrangler

Sequence QC, alignment, and BAM processing. Wraps FastQC, BWA/Bowtie2, SAMtools for automated read-to-BAM pipelines.

FreedomIntelligence
FreedomIntelligence
data-ai
open
data-engineering
2K

data-table-filters

Install and extend data-table-filters — a React data table system with faceted filters (checkbox, input, slider, timerange), sorting, infinite scroll, virtualization, and BYOS state management. Delivered as 11 shadcn registry blocks installable via `npx shadcn@latest add`. Use when: (1) installing data-table-filters from the shadcn registry, (2) adding extension blocks (command palette, AI filters, cell renderers, sheet panel, store adapters, schema system, Drizzle helpers, query layer), (3) configuring store adapters (nuqs/zustand/memory), (4) generating table schemas from a data model, (5) wiring up server-side filtering with Drizzle ORM, (6) connecting the React Query fetch layer, (7) auto-inferring schemas from raw JSON data with DataTableAuto / inferSchemaFromJSON, (8) adding AI-powered natural language filtering, (9) exposing tables as MCP endpoints for AI agents, (10) troubleshooting integration issues. Triggers on mentions of "data-table-filters", "data-table-filters.com", filterable data tables with

openstatusHQ
openstatusHQ
data-ai
open
data-engineering
1.9K

preprocessing-data-with-automated-pipelines

Process automate data cleaning, transformation, and validation for ML tasks. Use when requesting "preprocess data", "clean data", "ETL pipeline", or "data transformation". Trigger with relevant phrases based on skill purpose.

jeremylongshore
jeremylongshore
data-ai
open
data-engineering
1.9K

data-quality-checker

Validate data quality checker operations. Auto-activating skill for Data Pipelines. Triggers on: data quality checker, data quality checker Part of the Data Pipelines skill category. Use when working with data quality checker functionality. Trigger with phrases like "data quality checker", "data checker", "data".

jeremylongshore
jeremylongshore
data-ai
open
data-engineering
1.9K

data-partitioner

Process data partitioner operations. Auto-activating skill for Data Pipelines. Triggers on: data partitioner, data partitioner Part of the Data Pipelines skill category. Use when working with data partitioner functionality. Trigger with phrases like "data partitioner", "data partitioner", "data".

jeremylongshore
jeremylongshore
data-ai
open
data-engineering
1.9K

schema-optimization-orchestrator

Multi-phase schema optimization workflow orchestrator. Creates session directories, spawns phase agents sequentially, validates outputs, aggregates results. Trigger: "run schema optimization", "optimize schema workflow", "execute schema phases"

jeremylongshore
jeremylongshore
data-ai
open
data-engineering
1.9K

data-augmentation-pipeline

Process data augmentation pipeline operations. Auto-activating skill for ML Training. Triggers on: data augmentation pipeline, data augmentation pipeline Part of the ML Training skill category. Use when working with data augmentation pipeline functionality. Trigger with phrases like "data augmentation pipeline", "data pipeline", "data".

jeremylongshore
jeremylongshore
data-ai
open
data-engineering
1.9K

deepgram-data-handling

Implement audio data handling best practices for Deepgram integrations. Use when managing audio file storage, implementing data retention, or ensuring GDPR/HIPAA compliance for transcription data. Trigger: "deepgram data", "audio storage", "transcription data", "deepgram GDPR", "deepgram HIPAA", "deepgram privacy", "PII redaction".

jeremylongshore
jeremylongshore
data-ai
open
data-engineering
1.9K

deepgram-migration-deep-dive

Deep dive into migrating to Deepgram from other transcription providers. Use when migrating from AWS Transcribe, Google Cloud STT, Azure Speech, OpenAI Whisper, AssemblyAI, or Rev.ai to Deepgram. Trigger: "deepgram migration", "switch to deepgram", "migrate transcription", "deepgram from AWS", "deepgram from Google", "replace whisper with deepgram".

jeremylongshore
jeremylongshore
data-ai
open
data-engineering
1.9K

navan-data-sync

Implement incremental sync strategies for Navan BOOKING and TRANSACTION data with ETL pipeline patterns. Use when setting up production data pipelines, debugging sync drift, or adding real-time event processing. Trigger with "navan data sync", "navan incremental sync", "navan ETL pipeline".

jeremylongshore
jeremylongshore
data-ai
open
data-engineering
1.9K

brightdata-deploy-integration

Deploy Bright Data integrations to Vercel, Fly.io, and Cloud Run platforms. Use when deploying Bright Data-powered applications to production, configuring platform-specific secrets, or setting up deployment pipelines. Trigger with phrases like "deploy brightdata", "brightdata Vercel", "brightdata production deploy", "brightdata Cloud Run", "brightdata Fly.io".

jeremylongshore
jeremylongshore
data-ai
open
data-engineering
1.9K

databricks-core-workflow-a

Execute Databricks primary workflow: Delta Lake ETL pipelines. Use when building data ingestion pipelines, implementing medallion architecture, or creating Delta Lake transformations. Trigger with phrases like "databricks ETL", "delta lake pipeline", "medallion architecture", "databricks data pipeline", "bronze silver gold".

jeremylongshore
jeremylongshore
data-ai
open
data-engineering
1.9K

databricks-data-handling

Implement Delta Lake data management patterns including GDPR, PII handling, and data lifecycle. Use when implementing data retention, handling GDPR requests, or managing data lifecycle in Delta Lake. Trigger with phrases like "databricks GDPR", "databricks PII", "databricks data retention", "databricks data lifecycle", "delete user data".

jeremylongshore
jeremylongshore
data-ai
open
data-engineering
1.9K

feishu-evolver-wrapper

Feishu-integrated wrapper for the capability-evolver. Manages the evolution loop lifecycle (start/stop/ensure), sends rich Feishu card reports, and provides dashboard visualization. Use when running evolver with Feishu reporting or when managing the evolution daemon.

LeoYeAI
LeoYeAI
data-ai
open
data-engineering
1.9K

senior-data-engineer

Data engineering skill for building scalable data pipelines, ETL/ELT systems, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka, and modern data stack. Includes data modeling, pipeline orchestration, data quality, and DataOps. Use when designing data architectures, building data pipelines, optimizing data workflows, implementing data governance, or troubleshooting data issues.

LeoYeAI
LeoYeAI
data-ai
open
data-engineering
1.8K

sdd-onboard

Guided end-to-end walkthrough of the SDD workflow using the real codebase. Trigger: When the orchestrator launches you to onboard a user through the full SDD cycle.

Gentleman-Programming
Gentleman-Programming
data-ai
open
data-engineering
1.8K

team-arch-opt

Unified team skill for architecture optimization. Uses team-worker agent architecture with role directories for domain logic. Coordinator orchestrates pipeline, workers are team-worker agents. Triggers on "team arch-opt".

catlog22
catlog22
data-ai
open
data-engineering
1.8K

team-perf-opt

Unified team skill for performance optimization. Coordinator orchestrates pipeline, workers are team-worker agents. Supports single/fan-out/independent parallel modes. Triggers on "team perf-opt".

catlog22
catlog22
data-ai
open
data-engineering
1.8K

workflow-tdd-plan

Chain-loaded TDD planning. 7-phase with Red-Green-Refactor task generation.

catlog22
catlog22
data-ai
open
Previous
Page 11 / 65
Next