home/categories/data-engineering
category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 اسکلزall categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
data-engineering
0

backend--queries

Apply the Agent OS standard for backend queries.

tlabs-xyz
tlabs-xyz
data-ai
open
data-engineering
0

ef-core-patterns

Use this skill when implementing data access, repositories, or query logic with Entity Framework Core.

michaellperry
michaellperry
data-ai
open
data-engineering
0

csv-cleaner

Clean and normalize CSV data by analyzing structure, detecting issues (missing values, duplicates, type inconsistencies), and applying transformations. Use when users need to prepare messy CSV files for analysis or import.

elertan
elertan
data-ai
open
data-engineering
0

refactorpandas

Refactor Pandas code to improve maintainability, readability, and performance. Identifies and fixes loops/.iterrows() that should be vectorized, overuse of .apply() where vectorized alternatives exist, chained indexing patterns, inplace=True usage, inefficient dtypes, missing method chaining opportunities, complex filters, merge operations without validation, and SettingWithCopyWarning patterns. Applies Pandas 2.0+ features including PyArrow backend, Copy-on-Write, vectorized operations, method chaining, .query()/.eval(), optimized dtypes, and pipeline patterns.

SnakeO
SnakeO
data-ai
open
data-engineering
0

hive-endpoint

How to create API endpoints in Hive framework

paralect
paralect
data-ai
open
data-engineering
0

output-error-missing-schemas

Fix missing schema definitions in Output SDK steps. Use when seeing type errors, undefined properties at step boundaries, validation failures, or when step inputs/outputs aren't being properly typed.

growthxai
growthxai
data-ai
open
data-engineering
0

data-import-parsers

Implement or refactor data import/parsing so it streams files sequentially (memory-safe), validates/coerces types explicitly, skips irreparable records while logging them to an error CSV (full original columns + timestamp/file/line/error), emits rows_ok/rows_skipped/parse_errors metrics, and guarantees idempotent DB writes.

janjaszczak
janjaszczak
data-ai
open
data-engineering
0

data-integrity-guardian

Maintain database quality, ensure data alignment, verify mission compliance, prepare for JusticeHub syndication.

Acurioustractor
Acurioustractor
data-ai
open
data-engineering
0

dataclass-optimization

Python dataclass best practices: slots, frozen, validation. Trigger when optimizing dataclasses or creating config classes.

smith6jt-cop
smith6jt-cop
data-ai
open
data-engineering
0

skogai-jq

Use when performing JSON transformations, manipulating nested JSON structures, filtering arrays, extracting values, validating JSON schemas, or composing multi-step JSON operations. This skill provides 60+ schema-driven jq transformations optimized for AI agent discoverability.

SkogAI
SkogAI
data-ai
open
data-engineering
0

pm-03-data-quality

Assess data quality for the normalised log and recommend remediation thresholds and privacy handling.

Wattysaid
Wattysaid
data-ai
open
data-engineering
0

ontology-phase-4-generate

Phase 4 of Ontology Builder Pipeline. Generates final Ontology artifacts (entity definitions, workflow catalog, concept guides) from DRD. Use after Phase 3 DRD is complete.

a4b-corporation
a4b-corporation
data-ai
open
data-engineering
0

decision-log-audit

Audits Decision JSONL logs for schema compliance, required metadata, and invariants across recorded decisions.

Nepopams
Nepopams
data-ai
open
data-engineering
0

airflow-dag

Apache Airflow DAG development with TaskFlow API, Google Cloud operators (BigQuery, GCS), dbt integration, and dynamic DAG generation. Use when creating or modifying Airflow DAGs, implementing data pipeline orchestration, setting up cross-DAG dependencies with ExternalTaskSensor, adding deferrable operators, or configuring error handling and retries.

ilorozco11
ilorozco11
data-ai
open
data-engineering
0

synthetic-data-generator

テスト用の合成データ生成スキル。リアルなユーザーデータ、トランザクション、ログ、APIレスポンス等を生成。GDPR準拠、フェイカー連携、スキーマベース生成、時系列データ、異常データ生成に対応。

ntaksh42
ntaksh42
data-ai
open
data-engineering
0

data-source-connect

Connect your own data source to replace the demo unicorns data. Use when the user wants to use their own database URL or CSV file instead of the sample data. Triggers on requests to connect database, import CSV, change data source, use own data, or switch from demo data.

rebyteai-template
rebyteai-template
data-ai
open
data-engineering
0

altinity-expert-clickhouse-logs

Analyze ClickHouse system log table health including TTL configuration, disk usage, freshness, and cleanup. Use for system log issues and TTL configuration.

Altinity
Altinity
data-ai
open
data-engineering
0

data-engineering

Load when working on data pipelines, datasets, reproducibility, or data infrastructure topics. Contains best practices for data engineering, ETL/ELT patterns, and ensuring reproducible data workflows.

chekos
chekos
data-ai
open
data-engineering
0

snowflake-query

ユーザーの自然言語指示を受け取り、Snowflakeで実行するSQLクエリを生成・実行し、結果をCSV形式で出力します。「Snowflakeで〜を取得して」「〜のデータを表示して」などのリクエストで自動的に起動します。

0tarof
0tarof
data-ai
open
data-engineering
0

data-engineer

Build scalable data pipelines, ETL/ELT processes, and data infrastructure. Use when: (1) designing data architectures or lakehouse patterns, (2) building Spark/Kafka/Flink/Beam pipelines, (3) optimizing Snowflake/BigQuery/Redshift queries, (4) implementing Airflow/Prefect/Dagster orchestration, (5) setting up data quality frameworks, (6) cost-optimizing data platforms.

robertlupo1997
robertlupo1997
data-ai
open
data-engineering
0

ontology-phase-1-ingest

Phase 1 of Ontology Builder Pipeline. Ingests and catalogs all input materials from _input/ folder. Use when starting ontology building process or when processing new input documents for domain analysis.

a4b-corporation
a4b-corporation
data-ai
open
Previous
Page 60 / 65
Next