home/categories/data-engineering
category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 اسکلزall categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
data-engineering
3.7K

continuity-ledger

Create or update continuity ledger for state preservation across clears

parcadei
parcadei
data-ai
open
data-engineering
3.6K

csv-wrangling

Standard workflow order, tool selection matrix, and composition patterns for qsv CSV data wrangling

dathere
dathere
data-ai
open
data-engineering
3.6K

data-quality

Quality dimensions quick reference and remediation decision tree for tabular data assessment

dathere
dathere
data-ai
open
data-engineering
3.6K

reproducible-analysis

Machine-readable journal format for reproducible data analysis operations

dathere
dathere
data-ai
open
data-engineering
3.4K

openspec-sync-specs

Sync delta specs from a change to main specs. Use when the user wants to update main specs with changes from a delta spec, without archiving the change.

MetaCubeX
MetaCubeX
data-ai
open
data-engineering
3.2K

talk-pipeline

Orchestrates the complete talk preparation pipeline from raw material to revision sheets, running 6 stages in sequence with human-in-the-loop checkpoints for REX or Concept mode talks. Use when starting a new talk pipeline, resuming a pipeline from a specific stage, or running the full end-to-end preparation workflow.

FlorianBruniaux
FlorianBruniaux
data-ai
open
data-engineering
3K

muapi-seedance-2

Expert Cinema Director skill for Seedance 2.0 (ByteDance) — high-fidelity video generation using technical camera grammar and multimodal references. Supports text-to-video, image-to-video, video extension, beat-matching, dialogue, and e-commerce patterns.

SamurAIGPT
SamurAIGPT
data-ai
open
data-engineering
2.7K

aiox-data-engineer

Database Architect & Operations Engineer (Dara). Use for database design, schema architecture, Supabase configuration, RLS policies, migrations, query optimization, data modelin...

SynkraAI
SynkraAI
data-ai
open
data-engineering
2.7K

chdb-datastore

Drop-in pandas replacement with ClickHouse performance. Use `import chdb.datastore as pd` (or `from datastore import DataStore`) and write standard pandas code — same API, 10-100x faster on large datasets. Supports 16+ data sources (MySQL, PostgreSQL, S3, MongoDB, ClickHouse, Iceberg, Delta Lake, etc.) and 10+ file formats (Parquet, CSV, JSON, Arrow, ORC, etc.) with cross-source joins. Use this skill when the user wants to analyze data with pandas-style syntax, speed up slow pandas code, query remote databases or cloud storage as DataFrames, or join data across different sources — even if they don't explicitly mention chdb or DataStore. Do NOT use for raw SQL queries, ClickHouse server administration, or non-Python languages.

chdb-io
chdb-io
data-ai
open
data-engineering
2.7K

dataset-annotation

AI-assisted dataset annotation with COCO export — bbox, SAM2, DINOv3 methods

SharpAI
SharpAI
data-ai
open
data-engineering
2.7K

annotation-data

Dataset annotation management — COCO labels, sequences, export, and Kaggle upload

SharpAI
SharpAI
data-ai
open
data-engineering
2.5K

slackdump

Collect the Slack conversation data from Slackdump Archive format.

rusq
rusq
data-ai
open
data-engineering
2.5K

jazz-schema-design

Design and implement collaborative data schemas using the Jazz framework. Use this skill when building or working with Jazz apps to define data structures using CoValues. This skill focuses exclusively on schema definition and data modeling logic.

garden-co
garden-co
data-ai
open
data-engineering
2.5K

yaml-pipeline-transfer

YAML 流水线转换指南,涵盖 YAML 与 Model 双向转换、PAC(Pipeline as Code)实现、模板引用、触发器配置。当用户需要解析 YAML 流水线、实现 PAC 模式、处理流水线模板或进行 YAML 语法校验时使用。

TencentBlueKing
TencentBlueKing
data-ai
open
data-engineering
2.5K

23-database-sharding

数据库分片指南,涵盖分片策略设计、分片键选择、跨分片查询、数据迁移、分片路由规则。当用户设计数据库分片、选择分片键、处理跨分片查询或进行分片数据迁移时使用。

TencentBlueKing
TencentBlueKing
data-ai
open
data-engineering
2.5K

22-yaml-pipeline-transfer

YAML 流水线转换指南,涵盖 YAML 与 Model 双向转换、PAC(Pipeline as Code)实现、模板引用、触发器配置。当用户需要解析 YAML 流水线、实现 PAC 模式、处理流水线模板或进行 YAML 语法校验时使用。

TencentBlueKing
TencentBlueKing
data-ai
open
data-engineering
2.4K

hamilton-mcp

Interactive Hamilton DAG development via MCP tools. Validate, visualize, scaffold, and execute Hamilton pipelines without leaving the conversation. Use when building or debugging Hamilton dataflows interactively.

apache
apache
data-ai
open
data-engineering
2.4K

hamilton-scale

Performance and parallelization patterns for Hamilton including async I/O, Spark, Ray, Dask, caching, and multithreading. Use for scaling Hamilton workflows.

apache
apache
data-ai
open
data-engineering
2.4K

flask

Best practices for Flask web development including routing, blueprints, and testing.

microsoft
microsoft
data-ai
open
data-engineering
2.4K

oracle

Use the @steipete/oracle CLI to bundle a prompt plus the right files and get a second-model review (API or browser) for debugging, refactors, design checks, or cross-validation.

steipete
steipete
data-ai
open
data-engineering
2.2K

review-pr

Review a pull request for ArcticDB

man-group
man-group
data-ai
open
data-engineering
2.2K

etherscan-api

Query Etherscan V2 API for gas prices, account transactions, balances, token transfers, contract source code, and compilation metadata via Nethereum.DataServices

Nethereum
Nethereum
data-ai
open
data-engineering
2.2K

realtime-streaming

Stream real-time blockchain data with Nethereum. Use when the user asks about WebSocket subscriptions, new block headers, pending transactions, event log streaming, Rx observables, DEX monitoring, or live token transfer tracking.

Nethereum
Nethereum
data-ai
open
data-engineering
2.1K

claw-compactor

Claw Compactor — 6-layer token compression skill for OpenClaw agents. Cuts workspace token spend by 50–97% using deterministic rule-engines plus Engram: a real-time, LLM-driven Observational Memory system. Run at session start for automatic savings reporting.

open-compress
open-compress
data-ai
open
Previous
Page 10 / 65
Next