home/categories/data-engineering

category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 skillsall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-engineering

117

data-assets

Create and modify Primary Data Assets with property management

kevinpbuckley

data-ai

open

data-engineering

117

data-tables

Create and modify Data Tables with row management

kevinpbuckley

data-ai

open

data-engineering

115

Use this skill when building real-time data pipelines, stream processing jobs, or change data capture systems. Triggers on tasks involving Apache Kafka (producers, consumers, topics, partitions, consumer groups, Connect, Streams), Apache Flink (DataStream API, windowing, checkpointing, stateful processing), event sourcing implementations, CDC with Debezium, stream processing patterns (windowing, watermarks, exactly-once semantics), and any pipeline that processes unbounded data in motion rather than data at rest.

AbsolutelySkilled

data-ai

open

data-engineering

115

spatial-raw-processing

Process barcoded spatial transcriptomics FASTQ pairs with st_pipeline, preserve upstream artifacts, convert the counts matrix into a standardized raw_counts.h5ad, and hand off cleanly to spatial-preprocess.

TianGzlab

data-ai

open

data-engineering

115

data-warehousing

Use this skill when designing data warehouses, building star or snowflake schemas, implementing slowly changing dimensions (SCDs), writing analytical SQL for Snowflake or BigQuery, creating fact and dimension tables, or planning ETL/ELT pipelines for analytics. Triggers on dimensional modeling, surrogate keys, conformed dimensions, warehouse architecture, data vault, partitioning strategies, materialized views, and any task requiring OLAP schema design or warehouse query optimization.

AbsolutelySkilled

data-ai

open

data-engineering

115

data-quality

Use this skill when implementing data validation, data quality monitoring, data lineage tracking, data contracts, or Great Expectations test suites. Triggers on schema validation, data profiling, freshness checks, row-count anomalies, column drift, expectation suites, contract testing between producers and consumers, lineage graphs, data observability, and any task requiring data integrity enforcement across pipelines.

AbsolutelySkilled

data-ai

open

data-engineering

115

data-pipelines

Use this skill when building data pipelines, ETL/ELT workflows, or data transformation layers. Triggers on Airflow DAG design, dbt model creation, Spark job optimization, streaming vs batch architecture decisions, data ingestion, data quality checks, pipeline orchestration, incremental loads, CDC (change data capture), schema evolution, and data warehouse modeling. Acts as a senior data engineer advisor for building reliable, scalable data infrastructure.

AbsolutelySkilled

data-ai

open

data-engineering

115

analytics-engineering

Use this skill when building dbt models, designing semantic layers, defining metrics, creating self-serve analytics, or structuring a data warehouse for analyst consumption. Triggers on dbt project setup, model layering (staging, intermediate, marts), ref() and source() usage, YAML schema definitions, metrics definitions, semantic layer configuration, dimensional modeling, slowly changing dimensions, data testing, and any task requiring analytics engineering best practices.

AbsolutelySkilled

data-ai

open

data-engineering

115

research-vault

将研究成果持久化到 Obsidian vault，维护论文池索引。支持每日研究日志、论文卡片、综述归档，以及跨项目论文去重和快速检索。

gy-hou

data-ai

open

data-engineering

115

tracking-live-gtm

Use when the user wants to inspect the real live GTM runtime before schema generation or compare multiple live GTM containers.

jtrackingai

data-ai

open

data-engineering

114

data-analysis

Analyze CSV and tabular data, create summaries, and generate insights

chrispangg

data-ai

open

data-engineering

114

exploring-data

Exploratory data analysis using ydata-profiling. Use when users upload .csv/.xlsx/.json/.parquet files or request "explore data", "analyze dataset", "EDA", "profile data". Generates interactive HTML or JSON reports with statistics, visualizations, correlations, and quality alerts.

oaustegard

data-ai

open

data-engineering

113

deploy

Deploy agent to Databricks Apps using DAB (Databricks Asset Bundles). Use when: (1) User says 'deploy', 'push to databricks', or 'bundle deploy', (2) 'App already exists' error occurs, (3) Need to bind/unbind existing apps, (4) Debugging deployed apps, (5) Querying deployed app endpoints.

databricks

data-ai

open

data-engineering

113

bmad-tea

Enterprise Test Architecture (TEA) framework for quality engineering. Includes workflows for testing education (TEA Academy), risk-based test design, framework scaffolding, ATDD (Red-phase), CI/CD pipeline configuration, NFR (Non-functional) assessment, and quality auditing (0-100 scoring). Use for establishing or executing comprehensive testing strategies.

NeverSight

data-ai

open

data-engineering

113

mobile-framework-expo

Expo managed workflow

NeverSight

data-ai

open

data-engineering

113

api-cms-sanity

Structured content platform — GROQ queries, schema definitions, @sanity/client, Portable Text, image handling, real-time listeners, mutations, TypeGen

NeverSight

data-ai

open

data-engineering

113

api-vector-db-pinecone

Pinecone serverless vector database -- index management, vector operations, metadata filtering, namespaces, hybrid search, inference API

NeverSight

data-ai

open

data-engineering

113

api-vector-db-qdrant

Qdrant vector database -- collection management, point operations, payload filtering, named vectors, quantization, recommendations, snapshots

NeverSight

data-ai

open

data-engineering

113

firebase-data-connect

Firebase Data Connect integration for GraphQL-based data access with PostgreSQL. Use when building GraphQL schemas, queries, mutations, or integrating Firebase Data Connect with Angular applications. Supports type-safe generated SDKs, real-time subscriptions, and server-side data validation.

NeverSight

data-ai

open

data-engineering

113

mcp-repomix

Use the Repomix MCP server to package codebase into consolidated files for AI analysis, search file contents, and understand project structure; essential for comprehensive codebase analysis and context gathering.

NeverSight

data-ai

open

data-engineering

113

compute-management

Use when launching OCI compute instances, troubleshooting out-of-capacity or boot failures, optimizing compute costs, or handling instance lifecycle. Covers shape selection, capacity planning, service limits, and production incident resolution.

NeverSight

data-ai

open

data-engineering

113

database-management

Use when creating Autonomous Databases, troubleshooting connection failures, managing PDBs, or optimizing database costs. Covers connection string confusion, password validation errors, stop/start cost traps, clone type selection, and backup retention gotchas.

NeverSight

data-ai

open

data-engineering

113

deploy

databricks

data-ai

open

data-engineering

113

oracle-dba

Use when managing Oracle Autonomous Database on OCI, troubleshooting performance issues, optimizing costs, or implementing HA/DR. Covers ADB-specific gotchas, cost traps, SQL_ID debugging workflows, auto-scaling behavior, and version differences (19c/21c/23ai/26ai).

NeverSight

data-ai

open

Page 45 / 65