skills.homescapability registry Buscar

home/categories/data-engineering

category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 skillsall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-engineering

5

database-migration

ゼロダウンタイム戦略、データ変換、ロールバック手順を使用して、ORM間およびプラットフォーム間でデータベース移行を実行します。データベースの移行、スキーマの変更、データ変換の実行、またはゼロダウンタイムデプロイメント戦略の実装時に使用します。

amurata

data-ai

data-engineering

5

identity

Query identity change logs to explore profile creation and merging

treasure-data

data-ai

data-engineering

5

frappe-data-migration-generator

Generate data migration scripts for Frappe. Use when migrating data from legacy systems, transforming data structures, or importing large datasets.

Venkateshvenki404224

data-ai

data-engineering

5

databricks-python-sdk

Databricks development guidance including Python SDK, Databricks Connect, CLI, and REST API. Use when working with databricks-sdk, databricks-connect, or Databricks APIs.

databricks-solutions

data-ai

data-engineering

5

parquet-files

This describes how to create Parquet files in C#, including updating and multi threaded creation

lawless-m

data-ai

data-engineering

5

connector-config

Writes connector_config for segment/journey activations using `tdx connection schema <type>` to discover available fields. Use when configuring activations - always run schema command first to see connector-specific fields.

treasure-data

data-ai

data-engineering

5

data-reconciliation-exceptions

Reconciles data sources using stable identifiers (Pay Number, driving licence, driver card, and driver qualification card numbers), producing exception reports and “no silent failure” checks. Use when you need weekly matching with explicit reasons for non-joins and mismatches.

clawdbot

data-ai

data-engineering

5

time-filtering

Advanced td_interval patterns including offset dates (-1d/2025-10-01, -7d/-1d, 0M/now), td_interval_range for debugging, td_time_string for display formatting, and partition pruning optimization.

treasure-data

data-ai

data-engineering

4

polars

High-performance DataFrame library for fast data processing with lazy evaluation, parallel execution, and memory efficiency

vamseeachanta

data-ai

data-engineering

4

data-cleaning-pipeline-generator

Generates data cleaning pipelines for pandas/polars with handling for missing values, duplicates, outliers, type conversions, and data validation. Use when user asks to "clean data", "generate data pipeline", "handle missing values", or "remove duplicates from dataset".

Dexploarer

data-ai

data-engineering

4

docs-manager-skill

Orchestrates complete single-document workflows with automatic validation and indexing in a write→validate→index pipeline

$fractary$

fractary

data-ai

data-engineering

4

faber-coordinator

Coordinate FABER-DB operations within FABER workflow phases

$fractary$

fractary

data-ai

data-engineering

4

indexeddb-operations

Guide for working with IndexedDB database operations in the DEVS platform. Use this when asked to add database tables, modify schemas, or work with data persistence.

codename-co

data-ai

data-engineering

4

airflow

Python DAG workflow orchestration using Apache Airflow for data pipelines, ETL processes, and scheduled task automation

vamseeachanta

data-ai

data-engineering

4

log-director-skill

Orchestrates multi-log workflows with parallel execution for batch operations across many logs

$fractary$

fractary

data-ai

data-engineering

4

data-transformation

データ変換パイプラインの設計・実装・検証を整理するスキル。スキーママッピング、ETL設計、品質確認までの実務フローを提供する。 Anchors: • Designing Data-Intensive Applications / 適用: データモデリング / 目的: 変換の整合性確保 • Designing Data-Intensive Applications / 適用: スキーマ設計 / 目的: マッピングの明確化 • Designing Data-Intensive Applications / 適用: パイプライン設計 / 目的: 伸縮性と監視性の確保 Trigger: Use when designing data transformation pipelines, defining schema mappings, implementing ETL processes, or optimizing data flows. data transformation, schema mapping, etl design, pipeline optimization, data modeling

daishiman

data-ai

data-engineering

4

database-seeding

データベースの初期データ投入を安全に設計・実装・検証するスキル。開発/テスト/本番のデータ分離、シード戦略、再現性の確保を整理する。 Anchors: • Designing Data-Intensive Applications / 適用: データ整合性 / 目的: 参照整合性の担保 • Database Reliability Engineering / 適用: 運用設計 / 目的: 本番投入の安全性 • Data Quality Principles / 適用: データ品質 / 目的: 再現性と検証性の確保 Trigger: Use when planning database seeding, generating test fixtures, separating datasets by environment, or validating seed data quality. database seeding, test data, fixtures, seed strategy, environment separation, data validation

daishiman

data-ai

data-engineering

3

archiving-databases

Use when you need to archive historical database records to reduce primary database size. This skill automates moving old data to archive tables or cold storage (S3, Azure Blob, GCS). Trigger with phrases like "archive old database records", "implement data retention policy", "move historical data to cold storage", or "reduce database size with archival".

BbgnsurfTech

data-ai

data-engineering

3

policyengine-simulation-mechanics

Advanced simulation patterns with policyengine.py - ensure(), output_dataset.data, and map_to_entity()

PolicyEngine

data-ai

data-engineering

3

implementing-database-audit-logging

Use when you need to track database changes for compliance and security monitoring. This skill implements audit logging using triggers, application-level logging, CDC, or native logs. Trigger with phrases like "implement database audit logging", "add audit trails", "track database changes", or "monitor database activity for compliance".

BbgnsurfTech

data-ai

data-engineering

3

feature-flag-create-or-remove

When creating or removing a new feature flag from the database

jakewaldrip

data-ai

data-engineering

3

effect-streams-pipelines

Stream creation, transformation, sinks, batching, and resilience. Use when building data pipelines with concurrency and backpressure.

mepuka

data-ai

data-engineering

3

effect-collections-datastructs

Value-based data structures (Data.struct, tuple, array) and high-performance collections (Chunk, HashSet). Use for safe comparisons and pipelines.

mepuka

data-ai

data-engineering

3

memory-hygiene

Maintains memory cleanliness with deduplication, validation, and expiration

benreceveur

data-ai

Page 52 / 65