home/categories/data-engineering
category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 스킬all categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
data-engineering
5

database-migration

ゼロダウンタイム戦略、データ変換、ロールバック手順を使用して、ORM間およびプラットフォーム間でデータベース移行を実行します。データベースの移行、スキーマの変更、データ変換の実行、またはゼロダウンタイムデプロイメント戦略の実装時に使用します。

amurata
amurata
data-ai
open
data-engineering
5

identity

Query identity change logs to explore profile creation and merging

treasure-data
treasure-data
data-ai
open
data-engineering
5

frappe-data-migration-generator

Generate data migration scripts for Frappe. Use when migrating data from legacy systems, transforming data structures, or importing large datasets.

Venkateshvenki404224
Venkateshvenki404224
data-ai
open
data-engineering
5

databricks-python-sdk

Databricks development guidance including Python SDK, Databricks Connect, CLI, and REST API. Use when working with databricks-sdk, databricks-connect, or Databricks APIs.

databricks-solutions
databricks-solutions
data-ai
open
data-engineering
5

parquet-files

This describes how to create Parquet files in C#, including updating and multi threaded creation

lawless-m
lawless-m
data-ai
open
data-engineering
5

connector-config

Writes connector_config for segment/journey activations using `tdx connection schema <type>` to discover available fields. Use when configuring activations - always run schema command first to see connector-specific fields.

treasure-data
treasure-data
data-ai
open
data-engineering
5

data-reconciliation-exceptions

Reconciles data sources using stable identifiers (Pay Number, driving licence, driver card, and driver qualification card numbers), producing exception reports and “no silent failure” checks. Use when you need weekly matching with explicit reasons for non-joins and mismatches.

clawdbot
clawdbot
data-ai
open
data-engineering
5

time-filtering

Advanced td_interval patterns including offset dates (-1d/2025-10-01, -7d/-1d, 0M/now), td_interval_range for debugging, td_time_string for display formatting, and partition pruning optimization.

treasure-data
treasure-data
data-ai
open
data-engineering
4

polars

High-performance DataFrame library for fast data processing with lazy evaluation, parallel execution, and memory efficiency

vamseeachanta
vamseeachanta
data-ai
open
data-engineering
4

data-cleaning-pipeline-generator

Generates data cleaning pipelines for pandas/polars with handling for missing values, duplicates, outliers, type conversions, and data validation. Use when user asks to "clean data", "generate data pipeline", "handle missing values", or "remove duplicates from dataset".

Dexploarer
Dexploarer
data-ai
open
data-engineering
4

docs-manager-skill

Orchestrates complete single-document workflows with automatic validation and indexing in a write→validate→index pipeline

fractary
fractary
data-ai
open
data-engineering
4

faber-coordinator

Coordinate FABER-DB operations within FABER workflow phases

fractary
fractary
data-ai
open
data-engineering
4

indexeddb-operations

Guide for working with IndexedDB database operations in the DEVS platform. Use this when asked to add database tables, modify schemas, or work with data persistence.

codename-co
codename-co
data-ai
open
data-engineering
4

airflow

Python DAG workflow orchestration using Apache Airflow for data pipelines, ETL processes, and scheduled task automation

vamseeachanta
vamseeachanta
data-ai
open
data-engineering
4

log-director-skill

Orchestrates multi-log workflows with parallel execution for batch operations across many logs

fractary
fractary
data-ai
open
data-engineering
4

data-transformation

データ変換パイプラインの設計・実装・検証を整理するスキル。 スキーママッピング、ETL設計、品質確認までの実務フローを提供する。 Anchors: • Designing Data-Intensive Applications / 適用: データモデリング / 目的: 変換の整合性確保 • Designing Data-Intensive Applications / 適用: スキーマ設計 / 目的: マッピングの明確化 • Designing Data-Intensive Applications / 適用: パイプライン設計 / 目的: 伸縮性と監視性の確保 Trigger: Use when designing data transformation pipelines, defining schema mappings, implementing ETL processes, or optimizing data flows. data transformation, schema mapping, etl design, pipeline optimization, data modeling

daishiman
daishiman
data-ai
open
data-engineering
4

database-seeding

データベースの初期データ投入を安全に設計・実装・検証するスキル。 開発/テスト/本番のデータ分離、シード戦略、再現性の確保を整理する。 Anchors: • Designing Data-Intensive Applications / 適用: データ整合性 / 目的: 参照整合性の担保 • Database Reliability Engineering / 適用: 運用設計 / 目的: 本番投入の安全性 • Data Quality Principles / 適用: データ品質 / 目的: 再現性と検証性の確保 Trigger: Use when planning database seeding, generating test fixtures, separating datasets by environment, or validating seed data quality. database seeding, test data, fixtures, seed strategy, environment separation, data validation

daishiman
daishiman
data-ai
open
data-engineering
3

archiving-databases

Use when you need to archive historical database records to reduce primary database size. This skill automates moving old data to archive tables or cold storage (S3, Azure Blob, GCS). Trigger with phrases like "archive old database records", "implement data retention policy", "move historical data to cold storage", or "reduce database size with archival".

BbgnsurfTech
BbgnsurfTech
data-ai
open
data-engineering
3

implementing-database-audit-logging

Use when you need to track database changes for compliance and security monitoring. This skill implements audit logging using triggers, application-level logging, CDC, or native logs. Trigger with phrases like "implement database audit logging", "add audit trails", "track database changes", or "monitor database activity for compliance".

BbgnsurfTech
BbgnsurfTech
data-ai
open
data-engineering
3

effect-streams-pipelines

Stream creation, transformation, sinks, batching, and resilience. Use when building data pipelines with concurrency and backpressure.

mepuka
mepuka
data-ai
open
data-engineering
3

effect-collections-datastructs

Value-based data structures (Data.struct, tuple, array) and high-performance collections (Chunk, HashSet). Use for safe comparisons and pipelines.

mepuka
mepuka
data-ai
open
data-engineering
3

memory-hygiene

Maintains memory cleanliness with deduplication, validation, and expiration

benreceveur
benreceveur
data-ai
open
Previous
Page 52 / 65
Next