home/categories/data-engineering

category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541টি স্কিলall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-engineering

195

courtlistener-api

Legal case law database with PACER data and judge profiles

wentorai

data-ai

open

data-engineering

195

dataverse-api

Deposit and discover research datasets via Harvard Dataverse API

wentorai

data-ai

open

data-engineering

195

orkg-api

Query the Open Research Knowledge Graph for structured research data

wentorai

data-ai

open

data-engineering

195

data-collection-automation

Automate survey deployment, data collection, and pipeline management

wentorai

data-ai

open

data-engineering

194

Internal — for Boundless team members only. Query Boundless broker telemetry tables on Redshift for prod/staging operational data. Use when the user asks about broker health, request evaluations, request completions, proving times, skip rates, telemetry data, or wants to run SQL against the telemetry database on live networks. Do NOT use for debugging local code changes, reviewing PRs, or investigating issues in the codebase itself.

boundless-xyz

data-ai

open

data-engineering

188

azure-kusto

Query and analyze data in Azure Data Explorer (Kusto/ADX) using KQL for log analytics, telemetry, and time series analysis. WHEN: KQL queries, Kusto database queries, Azure Data Explorer, ADX clusters, log analytics, time series data, IoT telemetry, anomaly detection.

jonathan-vella

data-ai

open

data-engineering

188

workflow-engine

Machine-readable workflow DAG for the multi-step agent pipeline. Defines node types, edge conditions, gates, and fan-out patterns. USE FOR: Orchestrator step routing, resume-from-graph, workflow validation. DO NOT USE FOR: Azure infrastructure, code generation, troubleshooting.

jonathan-vella

data-ai

open

data-engineering

186

lambda-migration

Guides Phase 2 Lambda Container Migration steps. Pass a specific step number (2-1 through 2-5) to get the goals, deliverables, validation criteria, and detailed design for that step.

serithemage

data-ai

open

data-engineering

186

table

Use when you need to display structured data in rows and columns.

thedaviddias

data-ai

open

data-engineering

186

tree-view

Use when you need to display hierarchical data structures.

thedaviddias

data-ai

open

data-engineering

185

acuantia-dataform

Use when working on Acuantia's BigQuery Dataform pipeline (acuantia-gcp-dataform project) - adds Acuantia-specific patterns on top of dataform-engineering-fundamentals: ODS two-arg ref() syntax, looker_ filename prefix, Looker integration (looker_prod/looker_dev), acuantia dataset conventions, coordination with callrail_data_export/dialpad_data_integration/looker projects

majiayu000

data-ai

open

data-engineering

185

aether-temporal-collective

Distributed evolutionary memory system using Merkle-DAG branching timelines, holographic erasure coding, and stake-weighted consensus to maintain coherent collective history across thousands of agents despite forking narratives and temporal relativity.

majiayu000

data-ai

open

data-engineering

185

ahu-conductor

Air Handler Design Pipeline Orchestrator

majiayu000

data-ai

open

data-engineering

185

airflow-etl

Generate Apache Airflow ETL pipelines for government websites and document sources. Explores websites to find downloadable documents, verifies commercial use licenses, and creates complete Airflow DAG assets with daily scheduling. Use when user wants to create ETL pipelines, scrape government documents, or automate document collection workflows.

majiayu000

data-ai

open

data-engineering

185

airtable-model

PM Airtable data model reference. Use when creating tables, querying structure, or understanding relationships between Domain, Subdomain, Capability, Entity, Requirement, and BacklogItem tables.

majiayu000

data-ai

open

data-engineering

185

apache-spark-data-processing

Complete guide for Apache Spark data processing including RDDs, DataFrames, Spark SQL, streaming, MLlib, and production deployment

majiayu000

data-ai

open

data-engineering

185

aps-doc-ingestion

Expert documentation generation for ingestion layers. Automatically detects connector types (REST API, Database, File, Streaming), documents authentication patterns, rate limiting strategies, and incremental load patterns. Use when documenting data source ingestion workflows.

majiayu000

data-ai

open

data-engineering

185

aps-doc-staging

Expert documentation generation for staging transformation layers. Auto-detects SQL engine (Presto/Trino vs Hive), documents transformation rules, PII handling, deduplication strategies, and data quality rules. Use when documenting staging transformations.

majiayu000

data-ai

open

data-engineering

185

atft-pipeline

Manage J-Quants ingestion, feature graph generation, and cache hygiene for the ATFT-GAT-FAN dataset pipeline.

majiayu000

data-ai

open

data-engineering

185

backend-models-standards

Define database models with clear naming, appropriate data types, constraints, relationships, and validation at multiple layers. Use this skill when creating or modifying database model files, ORM classes, schema definitions, or data model relationships. Apply when working with model files (e.g., models.py, models/, ActiveRecord classes, Prisma schema, Sequelize models), defining table structures, setting up foreign keys and relationships, configuring cascade behaviors, implementing model validations, adding timestamps, or working with database constraints (NOT NULL, UNIQUE, foreign keys). Use for any task involving data integrity enforcement, relationship definitions, or model-level data validation.

majiayu000

data-ai

open

data-engineering

185

batch-execution-validator

Validate production batch execution - trigger daily runs and analyze traces for architecture completeness and result quality

majiayu000

data-ai

open

data-engineering

185

batch-processing-jobs

Implement robust batch processing systems with job queues, schedulers, background tasks, and distributed workers. Use when processing large datasets, scheduled tasks, async operations, or resource-intensive computations.

majiayu000

data-ai

open

data-engineering

185

big-data

Apache Spark, Hadoop, distributed computing, and large-scale data processing for petabyte-scale workloads

majiayu000

data-ai

open

data-engineering

185

bigquery-expert

BigQuery Expert Engineer Skill - Comprehensive guide for GoogleSQL queries, data management, performance optimization, and cost management

majiayu000

data-ai

open

Page 36 / 65