skills.homescapability registry Поиск

home/categories/data-engineering

category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 skillsall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-engineering

397

schema-normalizer

Normalize cross-skill JSONL interfaces (ids + titles + citation key formats) so downstream skills do not rely on best-effort joins. **Trigger**: schema normalize, jsonl contract, interface drift, join drift, 字段不一致, schema 规范化. **Use when**: you have generated C2-C4 JSONL artifacts (outline/briefs/bindings/packs/anchors) and want deterministic, stable fields before self-loops/writing. **Skip if**: you are not using the survey pipelines, or the workspace already has a fresh PASS `output/SCHEMA_NORMALIZATION_REPORT.md` for the current artifacts. **Network**: none. **Guardrail**: NO PROSE; deterministic transforms only; do not invent evidence/claims; only fill missing ids/titles from `outline/outline.yml`.

WILLOSCAR

data-ai

data-engineering

397

major-task

Work heavyweight framework or library tasks with planning-first research, selective deep analysis, and rigorous handoff

udecode

data-ai

data-engineering

396

clear-caches

Clear extension caches, analysis data, and drafts from the database. Use when testing sender lookups, analysis, or draft generation from a clean state.

ankitvgupta

data-ai

data-engineering

395

kraken-ws-streaming

Real-time data streaming via WebSocket for spot and futures.

krakenfx

data-ai

data-engineering

393

upstream-processing-skills-index

Skills for upstream data processing in single-cell and spatial omics, covering raw data generation, barcode processing, alignment, spatial registration, and technology-specific preprocessing pipelines.

aristoteleo

data-ai

data-engineering

377

aliyun-gbi-analytics

Use when managing Alibaba Cloud DataAnalysisGBI via OpenAPI/SDK, including the user needs DataAnalysisGBI resource lifecycle operations, configuration changes, status inspection, or troubleshooting for analytics service workflows.

cinience

data-ai

data-engineering

377

aliyun-dlf-manage-next

Use when managing Alibaba Cloud Data Lake Formation (DlfNext) via OpenAPI/SDK, including the user needs DLF Next catalog/governance resource operations, including listing resources, create/update flows, status checks, and troubleshooting metadata workflow issues.

cinience

data-ai

data-engineering

377

aliyun-dlf-manage

Use when managing Alibaba Cloud Data Lake Formation (DataLake) via OpenAPI/SDK, including the user asks for DataLake catalog resource operations, configuration updates, status queries, or troubleshooting DataLake API workflows.

cinience

data-ai

data-engineering

377

aliyun-adb-mysql

Use when managing Alibaba Cloud AnalyticDB for MySQL (ADB) via OpenAPI/SDK, including the user needs AnalyticDB resource lifecycle and configuration operations, status checks, or troubleshooting ADB API and cluster workflow issues.

cinience

data-ai

data-engineering

377

aliyun-pts-manage

Use when managing Alibaba Cloud Performance Testing Service (PTS) via OpenAPI/SDK, including scene lifecycle operations, test start/stop control, report retrieval, and metadata-driven API discovery before production changes.

cinience

data-ai

data-engineering

376

migrating-dbt-core-to-fusion

Use when a user needs help triaging dbt-core to Fusion migration errors. Runs dbt-autofix first, then classifies remaining errors into actionable categories (auto-fixable, guided fixes, needs input, blocked).

dbt-labs

data-ai

data-engineering

376

migrating-dbt-project-across-platforms

Use when migrating a dbt project from one data platform or data warehouse to another (e.g., Snowflake to Databricks, Databricks to Snowflake) using dbt Fusion's real-time compilation to identify and fix SQL dialect differences.

dbt-labs

data-ai

data-engineering

376

building-dbt-semantic-layer

Use when creating or modifying dbt Semantic Layer components — semantic models, metrics, dimensions, entities, measures, or time spines. Covers MetricFlow configuration, metric types (simple, derived, cumulative, ratio, conversion), and validation for both latest and legacy YAML specs.

dbt-labs

data-ai

data-engineering

376

using-dbt-for-analytics-engineering

Builds and modifies dbt models, writes SQL transformations using ref() and source(), creates tests, and validates results with dbt show. Use when doing any dbt work - building or modifying models, debugging errors, exploring unfamiliar data sources, writing tests, or evaluating impact of changes.

dbt-labs

data-ai

data-engineering

376

clickhouse-io

ClickHouse 数据库模式、查询优化、分析以及高性能分析工作负载的数据工程最佳实践。

xu-xiang

data-ai

data-engineering

376

deployment-patterns

Deployment workflows, CI/CD pipeline patterns, Docker containerization, health checks, rollback strategies, and production readiness checklists for web applications.

xu-xiang

data-ai

data-engineering

372

ln-1000-pipeline-orchestrator

Drives a Story through full pipeline (tasks, validation, execution, quality). Use when executing a Story end-to-end from kanban board.

levnikolaevich

data-ai

data-engineering

372

ln-723-seed-data-generator

Generates seed data from ORM schemas or entity definitions to any target format. Use when populating databases for development.

levnikolaevich

data-ai

data-engineering

371

molecular-docking-pipeline

Molecular Docking Pipeline - Complete docking workflow: retrieve protein structure, predict binding pockets, prepare receptor, and dock ligand. Use this skill for structural biology tasks involving retrieve protein data by pdbcode run fpocket convert pdb to pdbqt dock quick molecule docking. Combines 4 tools from 2 SCP server(s).

SpectrAI-Initiative

data-ai

data-engineering

366

biorxiv-database

Efficient database search tool for bioRxiv preprint server.

LigphiDonk

data-ai

data-engineering

354

get-td-quote

Get TDVM Quote Information

intel

data-ai

data-engineering

351

db-vacuum

VACUUM the SQLite database to reclaim disk space and consolidate WAL

babarot

data-ai

data-engineering

351

dhi-python

Ultra-fast data validation library for Python (520x faster than Pydantic). Use when building validated data models, API request/response schemas, or configuration objects. Provides Pydantic v2-compatible BaseModel API with Zig-powered native validation.

justrach

data-ai

data-engineering

349

312-frameworks-spring-data-jdbc

Use when you need to use Spring Data JDBC with Java records — including entity design with records, repository pattern, immutable updates, aggregate relationships, custom queries, transaction management, and avoiding N+1 problems. Part of the skills-for-java project

jabrena

data-ai

Page 27 / 65