home/categories/data-engineering
category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 skillsall categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
data-engineering
397

schema-normalizer

Normalize cross-skill JSONL interfaces (ids + titles + citation key formats) so downstream skills do not rely on best-effort joins. **Trigger**: schema normalize, jsonl contract, interface drift, join drift, 字段不一致, schema 规范化. **Use when**: you have generated C2-C4 JSONL artifacts (outline/briefs/bindings/packs/anchors) and want deterministic, stable fields before self-loops/writing. **Skip if**: you are not using the survey pipelines, or the workspace already has a fresh PASS `output/SCHEMA_NORMALIZATION_REPORT.md` for the current artifacts. **Network**: none. **Guardrail**: NO PROSE; deterministic transforms only; do not invent evidence/claims; only fill missing ids/titles from `outline/outline.yml`.

WILLOSCAR
WILLOSCAR
data-ai
open
data-engineering
397

major-task

Work heavyweight framework or library tasks with planning-first research, selective deep analysis, and rigorous handoff

udecode
udecode
data-ai
open
data-engineering
396

clear-caches

Clear extension caches, analysis data, and drafts from the database. Use when testing sender lookups, analysis, or draft generation from a clean state.

ankitvgupta
ankitvgupta
data-ai
open
data-engineering
395

kraken-ws-streaming

Real-time data streaming via WebSocket for spot and futures.

krakenfx
krakenfx
data-ai
open
data-engineering
393

upstream-processing-skills-index

Skills for upstream data processing in single-cell and spatial omics, covering raw data generation, barcode processing, alignment, spatial registration, and technology-specific preprocessing pipelines.

aristoteleo
aristoteleo
data-ai
open
data-engineering
377

aliyun-gbi-analytics

Use when managing Alibaba Cloud DataAnalysisGBI via OpenAPI/SDK, including the user needs DataAnalysisGBI resource lifecycle operations, configuration changes, status inspection, or troubleshooting for analytics service workflows.

cinience
cinience
data-ai
open
data-engineering
377

aliyun-dlf-manage-next

Use when managing Alibaba Cloud Data Lake Formation (DlfNext) via OpenAPI/SDK, including the user needs DLF Next catalog/governance resource operations, including listing resources, create/update flows, status checks, and troubleshooting metadata workflow issues.

cinience
cinience
data-ai
open
data-engineering
377

aliyun-dlf-manage

Use when managing Alibaba Cloud Data Lake Formation (DataLake) via OpenAPI/SDK, including the user asks for DataLake catalog resource operations, configuration updates, status queries, or troubleshooting DataLake API workflows.

cinience
cinience
data-ai
open
data-engineering
377

aliyun-adb-mysql

Use when managing Alibaba Cloud AnalyticDB for MySQL (ADB) via OpenAPI/SDK, including the user needs AnalyticDB resource lifecycle and configuration operations, status checks, or troubleshooting ADB API and cluster workflow issues.

cinience
cinience
data-ai
open
data-engineering
377

aliyun-pts-manage

Use when managing Alibaba Cloud Performance Testing Service (PTS) via OpenAPI/SDK, including scene lifecycle operations, test start/stop control, report retrieval, and metadata-driven API discovery before production changes.

cinience
cinience
data-ai
open
data-engineering
376

migrating-dbt-core-to-fusion

Use when a user needs help triaging dbt-core to Fusion migration errors. Runs dbt-autofix first, then classifies remaining errors into actionable categories (auto-fixable, guided fixes, needs input, blocked).

dbt-labs
dbt-labs
data-ai
open
data-engineering
376

migrating-dbt-project-across-platforms

Use when migrating a dbt project from one data platform or data warehouse to another (e.g., Snowflake to Databricks, Databricks to Snowflake) using dbt Fusion's real-time compilation to identify and fix SQL dialect differences.

dbt-labs
dbt-labs
data-ai
open
data-engineering
376

building-dbt-semantic-layer

Use when creating or modifying dbt Semantic Layer components — semantic models, metrics, dimensions, entities, measures, or time spines. Covers MetricFlow configuration, metric types (simple, derived, cumulative, ratio, conversion), and validation for both latest and legacy YAML specs.

dbt-labs
dbt-labs
data-ai
open
data-engineering
376

using-dbt-for-analytics-engineering

Builds and modifies dbt models, writes SQL transformations using ref() and source(), creates tests, and validates results with dbt show. Use when doing any dbt work - building or modifying models, debugging errors, exploring unfamiliar data sources, writing tests, or evaluating impact of changes.

dbt-labs
dbt-labs
data-ai
open
data-engineering
376

clickhouse-io

ClickHouse 数据库模式、查询优化、分析以及高性能分析工作负载的数据工程最佳实践。

xu-xiang
xu-xiang
data-ai
open
data-engineering
376

deployment-patterns

Deployment workflows, CI/CD pipeline patterns, Docker containerization, health checks, rollback strategies, and production readiness checklists for web applications.

xu-xiang
xu-xiang
data-ai
open
data-engineering
372

ln-1000-pipeline-orchestrator

Drives a Story through full pipeline (tasks, validation, execution, quality). Use when executing a Story end-to-end from kanban board.

levnikolaevich
levnikolaevich
data-ai
open
data-engineering
372

ln-723-seed-data-generator

Generates seed data from ORM schemas or entity definitions to any target format. Use when populating databases for development.

levnikolaevich
levnikolaevich
data-ai
open
data-engineering
371

molecular-docking-pipeline

Molecular Docking Pipeline - Complete docking workflow: retrieve protein structure, predict binding pockets, prepare receptor, and dock ligand. Use this skill for structural biology tasks involving retrieve protein data by pdbcode run fpocket convert pdb to pdbqt dock quick molecule docking. Combines 4 tools from 2 SCP server(s).

SpectrAI-Initiative
SpectrAI-Initiative
data-ai
open
data-engineering
366

biorxiv-database

Efficient database search tool for bioRxiv preprint server.

LigphiDonk
LigphiDonk
data-ai
open
data-engineering
351

db-vacuum

VACUUM the SQLite database to reclaim disk space and consolidate WAL

babarot
babarot
data-ai
open
data-engineering
351

dhi-python

Ultra-fast data validation library for Python (520x faster than Pydantic). Use when building validated data models, API request/response schemas, or configuration objects. Provides Pydantic v2-compatible BaseModel API with Zig-powered native validation.

justrach
justrach
data-ai
open
data-engineering
349

312-frameworks-spring-data-jdbc

Use when you need to use Spring Data JDBC with Java records — including entity design with records, repository pattern, immutable updates, aggregate relationships, custom queries, transaction management, and avoiding N+1 problems. Part of the skills-for-java project

jabrena
jabrena
data-ai
open
Previous
Page 27 / 65
Next