home/categories/data-engineering

category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 스킬all categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-engineering

build-only-validate-capability-flow

Validate a capability flow specification against schema constraints. Use after designing a process to ensure it conforms to framework rules. Triggers on "validate capability flow", "check process spec", "verify schema compliance".

pidster

data-ai

open

data-engineering

data-quality

Implement data validation rules, quality metrics, and data cleansing strategies

dasien

data-ai

open

data-engineering

ETL pipelines, Apache Spark, data warehousing, and big data processing. Use for building data pipelines, processing large datasets, or data infrastructure.

pluginagentmarketplace

data-ai

open

data-engineering

validating-database-integrity

Use when you need to ensure database integrity through comprehensive data validation. This skill validates data types, ranges, formats, referential integrity, and business rules. Trigger with phrases like "validate database data", "implement data validation rules", "enforce data integrity constraints", or "validate data formats".

BbgnsurfTech

data-ai

open

data-engineering

data-modeling

Dimensional modeling, normalization, and schema design for analytics.

timequity

data-ai

open

data-engineering

td-data-profiling

Comprehensive data profiling and quality assessment using Teradata ClearScape Analytics descriptive statistics functions

teradata-labs

data-ai

open

data-engineering

cleaning-data

Systematic data quality remediation - detect duplicates/outliers/inconsistencies, design cleaning strategy, execute transformations, verify results (component skill for DataPeeker analysis sessions)

tilmon-engineering

data-ai

open

data-engineering

duckdb-quadruple-interleave

Chaotic interleaving across local DuckDB databases modeled as coupled quadruple pendula. Random walks both BETWEEN databases and WITHIN tables for context injection.

plurigrid

data-ai

open

data-engineering

airflow

Airflow DAG patterns, KubernetesPodOperator, and debugging. Triggers on "dag", "airflow", "task", "operator", "KPO", "scheduler", "XCom".

pypeaday

data-ai

open

data-engineering

agentdb-state-manager

Persistent state management using AgentDB (DuckDB) for workflow analytics and checkpoints. Provides read-only analytics cache synchronized from TODO_*.md files, enabling: - Complex dependency graph queries - Historical workflow metrics - Context checkpoint storage/recovery - State transition analysis Use when: Data gathering and analysis for workflow state tracking Triggers: "analyze workflow", "query state", "checkpoint", "workflow metrics"

stharrold

data-ai

open

data-engineering

dst-check-freshness

Check data freshness and age for DST tables in DuckDB. Use when determining if data needs refreshing or validating data currency before analysis.

mikkelkrogsholm

data-ai

open

data-engineering

duck-agent

DuckDB file discovery agent with verified absolute paths

plurigrid

data-ai

open

data-engineering

oracle

Use the @steipete/oracle CLI to bundle a prompt plus the right files and get a second-model review (API or browser) for debugging, refactors, design checks, or cross-validation.

LarsEckart

data-ai

open

data-engineering

pulse-mcp-stream

Layer 1 Real-Time Social Stream Monitoring via MCP with DuckDB persistence

plurigrid

data-ai

open

data-engineeringmarketplace

adapter-assistant

Complete adapter lifecycle assistant for LimaCharlie. Supports External Adapters (cloud-managed), Cloud Sensors (SaaS/cloud integrations), and On-prem USP adapters. Dynamically researches adapter types from local docs and GitHub usp-adapters repo. Creates, validates, deploys, and troubleshoots adapter configurations. Handles parsing rules (Grok, regex), field mappings, credential setup, and multi-adapter configs. Use when setting up new data sources (Okta, S3, Azure Event Hub, syslog, webhook, etc.), troubleshooting ingestion issues, or managing adapter deployments.

$refractionPOINT$

refractionPOINT

data-ai

open

data-engineering

scalardb-sizing-estimator

ScalarDB Cluster および ScalarDB Analytics のアーキテクチャ、サイジング、構成を見積もるスキル。性能要件、可用性要件、クラウド環境からScalarDB Cluster Pod数、Kubernetes構成、バックエンドDB、API Gateway、監視システム等の全体構成を見積もる。 ScalarDB Analyticsを使用する場合はEMR/Databricksのサイジングも含む。使用タイミング: - 「ScalarDBのサイジングを見積もりたい」「ScalarDB環境を構築したい」 - 「ScalarDB Clusterの構成を決めたい」「ScalarDBの費用を算出したい」 - 「開発/テスト/ステージング/本番環境のScalarDB構成」 - CI/CD、Blue/Green、Canary Deploymentを含む本番環境設計 - 「ScalarDB Analyticsを使いたい」「分析クエリ環境を構築したい」 - 「EMR/Databricksのサイジングを見積もりたい」出力: Markdown形式の見積もり結果 + HTML形式のレポート費用: USD/JPY両建て（為替レート明記）

wfukatsu

data-ai

open

data-engineering

airflow

Airflow DAG patterns, KubernetesPodOperator, and debugging. Use on 'dag', 'airflow', 'task', 'operator', 'KPO', 'scheduler', 'XCom'.

pypeaday

data-ai

open

data-engineering

dst-data

Fetch actual data from Danmarks Statistik API and store in DuckDB. Use when user wants to download and store specific DST table data for analysis.

mikkelkrogsholm

data-ai

open

data-engineering

spark-basics

PySpark fundamentals for distributed data processing.

timequity

data-ai

open

data-engineering

anonymise

Anonymise CSV files by removing personal identifying information and adding datetime stamps. Use when user wants to process a new CSV file or strip PII from data.

sofer

data-ai

open

data-engineering

add-dlt-data-source

Scaffold new DLT pipeline for data ingestion to MotherDuck

nf-core

data-ai

open

data-engineering

cobol-migration-analyzer

Analyzes legacy COBOL programs and JCL jobs to assist with migration to modern Java applications. Extracts business logic, identifies dependencies, generates migration reports, and creates Java implementation strategies. Use when working with mainframe migration, COBOL analysis, legacy system modernization, JCL workflows, or when users mention COBOL to Java conversion, analyzing .cbl/.CBL/.cob files, working with copybooks, or planning Java service implementations from COBOL programs.

DauQuangThanh

data-ai

open

data-engineering

build-graph

GraphDB構築エージェント - ユビキタス言語とコード解析結果からRyuGraphデータベースを構築。/build-graph [対象パス] で呼び出し。

wfukatsu

data-ai

open

data-engineering

entropy-sequencer

Layer 5 Interaction Interleaving for Maximum Information Gain with DuckDB

plurigrid

data-ai

open

Page 53 / 65