home/categories/data-engineering
category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541টি স্কিলall categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
data-engineering
195

courtlistener-api

Legal case law database with PACER data and judge profiles

wentorai
wentorai
data-ai
open
data-engineering
195

dataverse-api

Deposit and discover research datasets via Harvard Dataverse API

wentorai
wentorai
data-ai
open
data-engineering
195

orkg-api

Query the Open Research Knowledge Graph for structured research data

wentorai
wentorai
data-ai
open
data-engineering
194

ops-telemetry-query

Internal — for Boundless team members only. Query Boundless broker telemetry tables on Redshift for prod/staging operational data. Use when the user asks about broker health, request evaluations, request completions, proving times, skip rates, telemetry data, or wants to run SQL against the telemetry database on live networks. Do NOT use for debugging local code changes, reviewing PRs, or investigating issues in the codebase itself.

boundless-xyz
boundless-xyz
data-ai
open
data-engineering
188

azure-kusto

Query and analyze data in Azure Data Explorer (Kusto/ADX) using KQL for log analytics, telemetry, and time series analysis. WHEN: KQL queries, Kusto database queries, Azure Data Explorer, ADX clusters, log analytics, time series data, IoT telemetry, anomaly detection.

jonathan-vella
jonathan-vella
data-ai
open
data-engineering
188

workflow-engine

Machine-readable workflow DAG for the multi-step agent pipeline. Defines node types, edge conditions, gates, and fan-out patterns. USE FOR: Orchestrator step routing, resume-from-graph, workflow validation. DO NOT USE FOR: Azure infrastructure, code generation, troubleshooting.

jonathan-vella
jonathan-vella
data-ai
open
data-engineering
186

lambda-migration

Guides Phase 2 Lambda Container Migration steps. Pass a specific step number (2-1 through 2-5) to get the goals, deliverables, validation criteria, and detailed design for that step.

serithemage
serithemage
data-ai
open
data-engineering
186

table

Use when you need to display structured data in rows and columns.

thedaviddias
thedaviddias
data-ai
open
data-engineering
186

tree-view

Use when you need to display hierarchical data structures.

thedaviddias
thedaviddias
data-ai
open
data-engineering
185

acuantia-dataform

Use when working on Acuantia's BigQuery Dataform pipeline (acuantia-gcp-dataform project) - adds Acuantia-specific patterns on top of dataform-engineering-fundamentals: ODS two-arg ref() syntax, looker_ filename prefix, Looker integration (looker_prod/looker_dev), acuantia dataset conventions, coordination with callrail_data_export/dialpad_data_integration/looker projects

majiayu000
majiayu000
data-ai
open
data-engineering
185

aether-temporal-collective

Distributed evolutionary memory system using Merkle-DAG branching timelines, holographic erasure coding, and stake-weighted consensus to maintain coherent collective history across thousands of agents despite forking narratives and temporal relativity.

majiayu000
majiayu000
data-ai
open
data-engineering
185

ahu-conductor

Air Handler Design Pipeline Orchestrator

majiayu000
majiayu000
data-ai
open
data-engineering
185

airflow-etl

Generate Apache Airflow ETL pipelines for government websites and document sources. Explores websites to find downloadable documents, verifies commercial use licenses, and creates complete Airflow DAG assets with daily scheduling. Use when user wants to create ETL pipelines, scrape government documents, or automate document collection workflows.

majiayu000
majiayu000
data-ai
open
data-engineering
185

airtable-model

PM Airtable data model reference. Use when creating tables, querying structure, or understanding relationships between Domain, Subdomain, Capability, Entity, Requirement, and BacklogItem tables.

majiayu000
majiayu000
data-ai
open
data-engineering
185

apache-spark-data-processing

Complete guide for Apache Spark data processing including RDDs, DataFrames, Spark SQL, streaming, MLlib, and production deployment

majiayu000
majiayu000
data-ai
open
data-engineering
185

aps-doc-ingestion

Expert documentation generation for ingestion layers. Automatically detects connector types (REST API, Database, File, Streaming), documents authentication patterns, rate limiting strategies, and incremental load patterns. Use when documenting data source ingestion workflows.

majiayu000
majiayu000
data-ai
open
data-engineering
185

aps-doc-staging

Expert documentation generation for staging transformation layers. Auto-detects SQL engine (Presto/Trino vs Hive), documents transformation rules, PII handling, deduplication strategies, and data quality rules. Use when documenting staging transformations.

majiayu000
majiayu000
data-ai
open
data-engineering
185

atft-pipeline

Manage J-Quants ingestion, feature graph generation, and cache hygiene for the ATFT-GAT-FAN dataset pipeline.

majiayu000
majiayu000
data-ai
open
data-engineering
185

backend-models-standards

Define database models with clear naming, appropriate data types, constraints, relationships, and validation at multiple layers. Use this skill when creating or modifying database model files, ORM classes, schema definitions, or data model relationships. Apply when working with model files (e.g., models.py, models/, ActiveRecord classes, Prisma schema, Sequelize models), defining table structures, setting up foreign keys and relationships, configuring cascade behaviors, implementing model validations, adding timestamps, or working with database constraints (NOT NULL, UNIQUE, foreign keys). Use for any task involving data integrity enforcement, relationship definitions, or model-level data validation.

majiayu000
majiayu000
data-ai
open
data-engineering
185

batch-execution-validator

Validate production batch execution - trigger daily runs and analyze traces for architecture completeness and result quality

majiayu000
majiayu000
data-ai
open
data-engineering
185

batch-processing-jobs

Implement robust batch processing systems with job queues, schedulers, background tasks, and distributed workers. Use when processing large datasets, scheduled tasks, async operations, or resource-intensive computations.

majiayu000
majiayu000
data-ai
open
data-engineering
185

big-data

Apache Spark, Hadoop, distributed computing, and large-scale data processing for petabyte-scale workloads

majiayu000
majiayu000
data-ai
open
data-engineering
185

bigquery-expert

BigQuery Expert Engineer Skill - Comprehensive guide for GoogleSQL queries, data management, performance optimization, and cost management

majiayu000
majiayu000
data-ai
open
Previous
Page 36 / 65
Next