home/categories/data-engineering

category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 اسکلزall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-engineering

3.7K

continuity-ledger

Create or update continuity ledger for state preservation across clears

parcadei

data-ai

open

data-engineering

3.6K

csv-wrangling

Standard workflow order, tool selection matrix, and composition patterns for qsv CSV data wrangling

dathere

data-ai

open

data-engineering

3.6K

data-quality

Quality dimensions quick reference and remediation decision tree for tabular data assessment

dathere

data-ai

open

data-engineering

3.6K

reproducible-analysis

Machine-readable journal format for reproducible data analysis operations

dathere

data-ai

open

data-engineering

3.4K

openspec-sync-specs

Sync delta specs from a change to main specs. Use when the user wants to update main specs with changes from a delta spec, without archiving the change.

MetaCubeX

data-ai

open

data-engineering

3.2K

Orchestrates the complete talk preparation pipeline from raw material to revision sheets, running 6 stages in sequence with human-in-the-loop checkpoints for REX or Concept mode talks. Use when starting a new talk pipeline, resuming a pipeline from a specific stage, or running the full end-to-end preparation workflow.

FlorianBruniaux

data-ai

open

data-engineering

muapi-seedance-2

Expert Cinema Director skill for Seedance 2.0 (ByteDance) — high-fidelity video generation using technical camera grammar and multimodal references. Supports text-to-video, image-to-video, video extension, beat-matching, dialogue, and e-commerce patterns.

SamurAIGPT

data-ai

open

data-engineering

2.7K

aiox-data-engineer

Database Architect & Operations Engineer (Dara). Use for database design, schema architecture, Supabase configuration, RLS policies, migrations, query optimization, data modelin...

SynkraAI

data-ai

open

data-engineering

2.7K

chdb-datastore

Drop-in pandas replacement with ClickHouse performance. Use `import chdb.datastore as pd` (or `from datastore import DataStore`) and write standard pandas code — same API, 10-100x faster on large datasets. Supports 16+ data sources (MySQL, PostgreSQL, S3, MongoDB, ClickHouse, Iceberg, Delta Lake, etc.) and 10+ file formats (Parquet, CSV, JSON, Arrow, ORC, etc.) with cross-source joins. Use this skill when the user wants to analyze data with pandas-style syntax, speed up slow pandas code, query remote databases or cloud storage as DataFrames, or join data across different sources — even if they don't explicitly mention chdb or DataStore. Do NOT use for raw SQL queries, ClickHouse server administration, or non-Python languages.

chdb-io

data-ai

open

data-engineering

2.7K

dataset-annotation

AI-assisted dataset annotation with COCO export — bbox, SAM2, DINOv3 methods

SharpAI

data-ai

open

data-engineering

2.7K

annotation-data

Dataset annotation management — COCO labels, sequences, export, and Kaggle upload

SharpAI

data-ai

open

data-engineering

2.5K

slackdump

Collect the Slack conversation data from Slackdump Archive format.

rusq

data-ai

open

data-engineering

2.5K

jazz-schema-design

Design and implement collaborative data schemas using the Jazz framework. Use this skill when building or working with Jazz apps to define data structures using CoValues. This skill focuses exclusively on schema definition and data modeling logic.

garden-co

data-ai

open

data-engineering

2.5K

yaml-pipeline-transfer

YAML 流水线转换指南，涵盖 YAML 与 Model 双向转换、PAC（Pipeline as Code）实现、模板引用、触发器配置。当用户需要解析 YAML 流水线、实现 PAC 模式、处理流水线模板或进行 YAML 语法校验时使用。

TencentBlueKing

data-ai

open

data-engineering

2.5K

23-database-sharding

数据库分片指南，涵盖分片策略设计、分片键选择、跨分片查询、数据迁移、分片路由规则。当用户设计数据库分片、选择分片键、处理跨分片查询或进行分片数据迁移时使用。

TencentBlueKing

data-ai

open

data-engineering

2.5K

22-yaml-pipeline-transfer

TencentBlueKing

data-ai

open

data-engineering

2.4K

hamilton-mcp

Interactive Hamilton DAG development via MCP tools. Validate, visualize, scaffold, and execute Hamilton pipelines without leaving the conversation. Use when building or debugging Hamilton dataflows interactively.

apache

data-ai

open

data-engineering

2.4K

hamilton-scale

Performance and parallelization patterns for Hamilton including async I/O, Spark, Ray, Dask, caching, and multithreading. Use for scaling Hamilton workflows.

apache

data-ai

open

data-engineering

2.4K

flask

Best practices for Flask web development including routing, blueprints, and testing.

microsoft

data-ai

open

data-engineering

2.4K

oracle

Use the @steipete/oracle CLI to bundle a prompt plus the right files and get a second-model review (API or browser) for debugging, refactors, design checks, or cross-validation.

steipete

data-ai

open

data-engineering

2.2K

review-pr

Review a pull request for ArcticDB

man-group

data-ai

open

data-engineering

2.2K

etherscan-api

Query Etherscan V2 API for gas prices, account transactions, balances, token transfers, contract source code, and compilation metadata via Nethereum.DataServices

Nethereum

data-ai

open

data-engineering

2.2K

realtime-streaming

Stream real-time blockchain data with Nethereum. Use when the user asks about WebSocket subscriptions, new block headers, pending transactions, event log streaming, Rx observables, DEX monitoring, or live token transfer tracking.

Nethereum

data-ai

open

data-engineering

2.1K

claw-compactor

Claw Compactor — 6-layer token compression skill for OpenClaw agents. Cuts workspace token spend by 50–97% using deterministic rule-engines plus Engram: a real-time, LLM-driven Observational Memory system. Run at session start for automatic savings reporting.

open-compress

data-ai

open

Page 10 / 65