home/categories/data-engineering

category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 スキルall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-engineering

32.1K

dbos-python

Guide for building reliable, fault-tolerant Python applications with DBOS durable workflows. Use when adding DBOS to existing Python code, creating workflows and steps, or using queues for concurrency control.

sickn33

data-ai

open

data-engineering

32.1K

dbos-typescript

Guide for building reliable, fault-tolerant TypeScript applications with DBOS durable workflows. Use when adding DBOS to existing TypeScript code, creating workflows and steps, or using queues for concurrency control.

sickn33

data-ai

open

data-engineering

32.1K

distributed-debugging-debug-trace

You are a debugging expert specializing in setting up comprehensive debugging environments, distributed tracing, and diagnostic tools. Configure debugging workflows, implement tracing solutions, and establish troubleshooting practices for development and production environments.

sickn33

data-ai

open

data-engineering

32.1K

docker-expert

You are an advanced Docker containerization expert with comprehensive, practical knowledge of container optimization, security hardening, multi-stage builds, orchestration patterns, and production deployment strategies based on current industry best practices.

sickn33

data-ai

open

data-engineering

32.1K

graphql-architect

Master modern GraphQL with federation, performance optimization, and enterprise security. Build scalable schemas, implement advanced caching, and design real-time systems.

sickn33

data-ai

open

data-engineering

32.1K

jq

Expert jq usage for JSON querying, filtering, transformation, and pipeline integration. Practical patterns for real shell workflows.

sickn33

data-ai

open

data-engineering

32.1K

llm-ops

LLM Operations -- RAG, embeddings, vector databases, fine-tuning, prompt engineering avancado, custos de LLM, evals de qualidade e arquiteturas de IA para producao.

sickn33

data-ai

open

data-engineering

32.1K

monte-carlo-push-ingestion

Expert guide for pushing metadata, lineage, and query logs to Monte Carlo from any data warehouse.

sickn33

data-ai

open

data-engineering

32.1K

nosql-expert

Expert guidance for distributed NoSQL databases (Cassandra, DynamoDB). Focuses on mental models, query-first modeling, single-table design, and avoiding hot partitions in high-scale systems.

sickn33

data-ai

open

data-engineering

32.1K

odoo-docker-deployment

Production-ready Docker and docker-compose setup for Odoo with PostgreSQL, persistent volumes, environment-based configuration, and Nginx reverse proxy.

sickn33

data-ai

open

data-engineering

32.1K

polars

Fast in-memory DataFrame library for datasets that fit in RAM. Use when pandas is too slow but data still fits in memory. Lazy evaluation, parallel execution, Apache Arrow backend. Best for 1-100GB datasets, ETL pipelines, faster pandas replacement. For larger-than-RAM data use dask or vaex.

sickn33

data-ai

open

data-engineering

32.1K

production-code-audit

Autonomously deep-scan entire codebase line-by-line, understand architecture and patterns, then systematically transform it to production-grade, corporate-level professional quality with optimizations

sickn33

data-ai

open

data-engineering

32.1K

seo-schema

Detect, validate, and generate Schema.org structured data. JSON-LD format preferred. Use when user says "schema", "structured data", "rich results", "JSON-LD", or "markup".

sickn33

data-ai

open

data-engineering

32.1K

spark-optimization

Optimize Apache Spark jobs with partitioning, caching, shuffle optimization, and memory tuning. Use when improving Spark performance, debugging slow jobs, or scaling data processing pipelines.

sickn33

data-ai

open

data-engineering

32.1K

sql-pro

Master modern SQL with cloud-native databases, OLTP/OLAP optimization, and advanced query techniques. Expert in performance tuning, data modeling, and hybrid analytical systems.

sickn33

data-ai

open

data-engineering

32.1K

tdd-orchestrator

Master TDD orchestrator specializing in red-green-refactor discipline, multi-agent workflow coordination, and comprehensive test-driven development practices.

sickn33

data-ai

open

data-engineering

32.1K

tdd-workflows-tdd-cycle

Use when working with tdd workflows tdd cycle

sickn33

data-ai

open

data-engineering

32.1K

tdd-workflows-tdd-refactor

Use when working with tdd workflows tdd refactor

sickn33

data-ai

open

data-engineering

32.1K

temporal-python-pro

Master Temporal workflow orchestration with Python SDK. Implements durable workflows, saga patterns, and distributed transactions. Covers async/await, testing strategies, and production deployment.

sickn33

data-ai

open

data-engineering

32K

system-table-change

Use when adding, removing, or modifying columns/indexes on system tables. Provides a checklist covering schema definitions, migrations, version gates, golden files, and test hashes.

cockroachdb

data-ai

open

data-engineering

31.2K

agentdb-advanced-features

Master advanced AgentDB features including QUIC synchronization, multi-database management, custom distance metrics, hybrid search, and distributed systems integration. Use when building distributed AI systems, multi-agent coordination, or advanced vector search applications.

ruvnet

data-ai

open

data-engineering

31.2K

agentdb-performance-optimization

Optimize AgentDB performance with quantization (4-32x memory reduction), HNSW indexing (150x faster search), caching, and batch operations. Use when optimizing memory usage, improving search speed, or scaling to millions of vectors.

ruvnet

data-ai

open

data-engineering

31.2K

memory-management

AgentDB memory system with HNSW vector search. Provides 150x-12,500x faster pattern retrieval, persistent storage, and semantic search capabilities for learning and knowledge management. Use when: need to store successful patterns, searching for similar solutions, semantic lookup of past work, learning from previous tasks, sharing knowledge between agents, building knowledge base. Skip when: no learning needed, ephemeral one-off tasks, external data sources available, read-only exploration.

ruvnet

data-ai

open

data-engineering

31.2K

stream-chain

Stream-JSON chaining for multi-agent pipelines, data transformation, and sequential workflows

ruvnet

data-ai

open

Page 3 / 65