home/categories/data-engineering
category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 스킬all categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
data-engineering
117

data-assets

Create and modify Primary Data Assets with property management

kevinpbuckley
kevinpbuckley
data-ai
open
data-engineering
117

data-tables

Create and modify Data Tables with row management

kevinpbuckley
kevinpbuckley
data-ai
open
data-engineering
115

real-time-streaming

Use this skill when building real-time data pipelines, stream processing jobs, or change data capture systems. Triggers on tasks involving Apache Kafka (producers, consumers, topics, partitions, consumer groups, Connect, Streams), Apache Flink (DataStream API, windowing, checkpointing, stateful processing), event sourcing implementations, CDC with Debezium, stream processing patterns (windowing, watermarks, exactly-once semantics), and any pipeline that processes unbounded data in motion rather than data at rest.

AbsolutelySkilled
AbsolutelySkilled
data-ai
open
data-engineering
115

spatial-raw-processing

Process barcoded spatial transcriptomics FASTQ pairs with st_pipeline, preserve upstream artifacts, convert the counts matrix into a standardized raw_counts.h5ad, and hand off cleanly to spatial-preprocess.

TianGzlab
TianGzlab
data-ai
open
data-engineering
115

data-warehousing

Use this skill when designing data warehouses, building star or snowflake schemas, implementing slowly changing dimensions (SCDs), writing analytical SQL for Snowflake or BigQuery, creating fact and dimension tables, or planning ETL/ELT pipelines for analytics. Triggers on dimensional modeling, surrogate keys, conformed dimensions, warehouse architecture, data vault, partitioning strategies, materialized views, and any task requiring OLAP schema design or warehouse query optimization.

AbsolutelySkilled
AbsolutelySkilled
data-ai
open
data-engineering
115

data-quality

Use this skill when implementing data validation, data quality monitoring, data lineage tracking, data contracts, or Great Expectations test suites. Triggers on schema validation, data profiling, freshness checks, row-count anomalies, column drift, expectation suites, contract testing between producers and consumers, lineage graphs, data observability, and any task requiring data integrity enforcement across pipelines.

AbsolutelySkilled
AbsolutelySkilled
data-ai
open
data-engineering
115

data-pipelines

Use this skill when building data pipelines, ETL/ELT workflows, or data transformation layers. Triggers on Airflow DAG design, dbt model creation, Spark job optimization, streaming vs batch architecture decisions, data ingestion, data quality checks, pipeline orchestration, incremental loads, CDC (change data capture), schema evolution, and data warehouse modeling. Acts as a senior data engineer advisor for building reliable, scalable data infrastructure.

AbsolutelySkilled
AbsolutelySkilled
data-ai
open
data-engineering
115

analytics-engineering

Use this skill when building dbt models, designing semantic layers, defining metrics, creating self-serve analytics, or structuring a data warehouse for analyst consumption. Triggers on dbt project setup, model layering (staging, intermediate, marts), ref() and source() usage, YAML schema definitions, metrics definitions, semantic layer configuration, dimensional modeling, slowly changing dimensions, data testing, and any task requiring analytics engineering best practices.

AbsolutelySkilled
AbsolutelySkilled
data-ai
open
data-engineering
115

research-vault

将研究成果持久化到 Obsidian vault,维护论文池索引。支持每日研究日志、论文卡片、综述归档,以及跨项目论文去重和快速检索。

gy-hou
gy-hou
data-ai
open
data-engineering
115

tracking-live-gtm

Use when the user wants to inspect the real live GTM runtime before schema generation or compare multiple live GTM containers.

jtrackingai
jtrackingai
data-ai
open
data-engineering
114

data-analysis

Analyze CSV and tabular data, create summaries, and generate insights

chrispangg
chrispangg
data-ai
open
data-engineering
114

exploring-data

Exploratory data analysis using ydata-profiling. Use when users upload .csv/.xlsx/.json/.parquet files or request "explore data", "analyze dataset", "EDA", "profile data". Generates interactive HTML or JSON reports with statistics, visualizations, correlations, and quality alerts.

oaustegard
oaustegard
data-ai
open
data-engineering
113

deploy

Deploy agent to Databricks Apps using DAB (Databricks Asset Bundles). Use when: (1) User says 'deploy', 'push to databricks', or 'bundle deploy', (2) 'App already exists' error occurs, (3) Need to bind/unbind existing apps, (4) Debugging deployed apps, (5) Querying deployed app endpoints.

databricks
databricks
data-ai
open
data-engineering
113

bmad-tea

Enterprise Test Architecture (TEA) framework for quality engineering. Includes workflows for testing education (TEA Academy), risk-based test design, framework scaffolding, ATDD (Red-phase), CI/CD pipeline configuration, NFR (Non-functional) assessment, and quality auditing (0-100 scoring). Use for establishing or executing comprehensive testing strategies.

NeverSight
NeverSight
data-ai
open
data-engineering
113

api-cms-sanity

Structured content platform — GROQ queries, schema definitions, @sanity/client, Portable Text, image handling, real-time listeners, mutations, TypeGen

NeverSight
NeverSight
data-ai
open
data-engineering
113

api-vector-db-pinecone

Pinecone serverless vector database -- index management, vector operations, metadata filtering, namespaces, hybrid search, inference API

NeverSight
NeverSight
data-ai
open
data-engineering
113

api-vector-db-qdrant

Qdrant vector database -- collection management, point operations, payload filtering, named vectors, quantization, recommendations, snapshots

NeverSight
NeverSight
data-ai
open
data-engineering
113

firebase-data-connect

Firebase Data Connect integration for GraphQL-based data access with PostgreSQL. Use when building GraphQL schemas, queries, mutations, or integrating Firebase Data Connect with Angular applications. Supports type-safe generated SDKs, real-time subscriptions, and server-side data validation.

NeverSight
NeverSight
data-ai
open
data-engineering
113

mcp-repomix

Use the Repomix MCP server to package codebase into consolidated files for AI analysis, search file contents, and understand project structure; essential for comprehensive codebase analysis and context gathering.

NeverSight
NeverSight
data-ai
open
data-engineering
113

compute-management

Use when launching OCI compute instances, troubleshooting out-of-capacity or boot failures, optimizing compute costs, or handling instance lifecycle. Covers shape selection, capacity planning, service limits, and production incident resolution.

NeverSight
NeverSight
data-ai
open
data-engineering
113

database-management

Use when creating Autonomous Databases, troubleshooting connection failures, managing PDBs, or optimizing database costs. Covers connection string confusion, password validation errors, stop/start cost traps, clone type selection, and backup retention gotchas.

NeverSight
NeverSight
data-ai
open
data-engineering
113

deploy

Deploy agent to Databricks Apps using DAB (Databricks Asset Bundles). Use when: (1) User says 'deploy', 'push to databricks', or 'bundle deploy', (2) 'App already exists' error occurs, (3) Need to bind/unbind existing apps, (4) Debugging deployed apps, (5) Querying deployed app endpoints.

databricks
databricks
data-ai
open
data-engineering
113

oracle-dba

Use when managing Oracle Autonomous Database on OCI, troubleshooting performance issues, optimizing costs, or implementing HA/DR. Covers ADB-specific gotchas, cost traps, SQL_ID debugging workflows, auto-scaling behavior, and version differences (19c/21c/23ai/26ai).

NeverSight
NeverSight
data-ai
open
Previous
Page 45 / 65
Next