home/categories/data-engineering
category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 スキルall categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
data-engineering
32.1K

dbos-python

Guide for building reliable, fault-tolerant Python applications with DBOS durable workflows. Use when adding DBOS to existing Python code, creating workflows and steps, or using queues for concurrency control.

sickn33
sickn33
data-ai
open
data-engineering
32.1K

dbos-typescript

Guide for building reliable, fault-tolerant TypeScript applications with DBOS durable workflows. Use when adding DBOS to existing TypeScript code, creating workflows and steps, or using queues for concurrency control.

sickn33
sickn33
data-ai
open
data-engineering
32.1K

distributed-debugging-debug-trace

You are a debugging expert specializing in setting up comprehensive debugging environments, distributed tracing, and diagnostic tools. Configure debugging workflows, implement tracing solutions, and establish troubleshooting practices for development and production environments.

sickn33
sickn33
data-ai
open
data-engineering
32.1K

docker-expert

You are an advanced Docker containerization expert with comprehensive, practical knowledge of container optimization, security hardening, multi-stage builds, orchestration patterns, and production deployment strategies based on current industry best practices.

sickn33
sickn33
data-ai
open
data-engineering
32.1K

graphql-architect

Master modern GraphQL with federation, performance optimization, and enterprise security. Build scalable schemas, implement advanced caching, and design real-time systems.

sickn33
sickn33
data-ai
open
data-engineering
32.1K

jq

Expert jq usage for JSON querying, filtering, transformation, and pipeline integration. Practical patterns for real shell workflows.

sickn33
sickn33
data-ai
open
data-engineering
32.1K

llm-ops

LLM Operations -- RAG, embeddings, vector databases, fine-tuning, prompt engineering avancado, custos de LLM, evals de qualidade e arquiteturas de IA para producao.

sickn33
sickn33
data-ai
open
data-engineering
32.1K

monte-carlo-push-ingestion

Expert guide for pushing metadata, lineage, and query logs to Monte Carlo from any data warehouse.

sickn33
sickn33
data-ai
open
data-engineering
32.1K

nosql-expert

Expert guidance for distributed NoSQL databases (Cassandra, DynamoDB). Focuses on mental models, query-first modeling, single-table design, and avoiding hot partitions in high-scale systems.

sickn33
sickn33
data-ai
open
data-engineering
32.1K

odoo-docker-deployment

Production-ready Docker and docker-compose setup for Odoo with PostgreSQL, persistent volumes, environment-based configuration, and Nginx reverse proxy.

sickn33
sickn33
data-ai
open
data-engineering
32.1K

polars

Fast in-memory DataFrame library for datasets that fit in RAM. Use when pandas is too slow but data still fits in memory. Lazy evaluation, parallel execution, Apache Arrow backend. Best for 1-100GB datasets, ETL pipelines, faster pandas replacement. For larger-than-RAM data use dask or vaex.

sickn33
sickn33
data-ai
open
data-engineering
32.1K

production-code-audit

Autonomously deep-scan entire codebase line-by-line, understand architecture and patterns, then systematically transform it to production-grade, corporate-level professional quality with optimizations

sickn33
sickn33
data-ai
open
data-engineering
32.1K

seo-schema

Detect, validate, and generate Schema.org structured data. JSON-LD format preferred. Use when user says "schema", "structured data", "rich results", "JSON-LD", or "markup".

sickn33
sickn33
data-ai
open
data-engineering
32.1K

spark-optimization

Optimize Apache Spark jobs with partitioning, caching, shuffle optimization, and memory tuning. Use when improving Spark performance, debugging slow jobs, or scaling data processing pipelines.

sickn33
sickn33
data-ai
open
data-engineering
32.1K

sql-pro

Master modern SQL with cloud-native databases, OLTP/OLAP optimization, and advanced query techniques. Expert in performance tuning, data modeling, and hybrid analytical systems.

sickn33
sickn33
data-ai
open
data-engineering
32.1K

tdd-orchestrator

Master TDD orchestrator specializing in red-green-refactor discipline, multi-agent workflow coordination, and comprehensive test-driven development practices.

sickn33
sickn33
data-ai
open
data-engineering
32.1K

temporal-python-pro

Master Temporal workflow orchestration with Python SDK. Implements durable workflows, saga patterns, and distributed transactions. Covers async/await, testing strategies, and production deployment.

sickn33
sickn33
data-ai
open
data-engineering
32K

system-table-change

Use when adding, removing, or modifying columns/indexes on system tables. Provides a checklist covering schema definitions, migrations, version gates, golden files, and test hashes.

cockroachdb
cockroachdb
data-ai
open
data-engineering
31.2K

agentdb-advanced-features

Master advanced AgentDB features including QUIC synchronization, multi-database management, custom distance metrics, hybrid search, and distributed systems integration. Use when building distributed AI systems, multi-agent coordination, or advanced vector search applications.

ruvnet
ruvnet
data-ai
open
data-engineering
31.2K

agentdb-performance-optimization

Optimize AgentDB performance with quantization (4-32x memory reduction), HNSW indexing (150x faster search), caching, and batch operations. Use when optimizing memory usage, improving search speed, or scaling to millions of vectors.

ruvnet
ruvnet
data-ai
open
data-engineering
31.2K

memory-management

AgentDB memory system with HNSW vector search. Provides 150x-12,500x faster pattern retrieval, persistent storage, and semantic search capabilities for learning and knowledge management. Use when: need to store successful patterns, searching for similar solutions, semantic lookup of past work, learning from previous tasks, sharing knowledge between agents, building knowledge base. Skip when: no learning needed, ephemeral one-off tasks, external data sources available, read-only exploration.

ruvnet
ruvnet
data-ai
open
data-engineering
31.2K

stream-chain

Stream-JSON chaining for multi-agent pipelines, data transformation, and sequential workflows

ruvnet
ruvnet
data-ai
open
Previous
Page 3 / 65
Next