csv-wave-pipeline
Requirement planning to wave-based CSV execution pipeline. Decomposes requirement into dependency-sorted CSV tasks, computes execution waves, runs wave-by-wave via spawn_agents_on_csv with cross-wave context propagation.
Requirement planning to wave-based CSV execution pipeline. Decomposes requirement into dependency-sorted CSV tasks, computes execution waves, runs wave-by-wave via spawn_agents_on_csv with cross-wave context propagation.
Unified team skill for architecture optimization. Uses team-worker agent architecture with role directories for domain logic. Coordinator orchestrates pipeline, workers are team-worker agents. Triggers on "team arch-opt".
Unified team skill for issue resolution. Uses team-worker agent architecture with role directories for domain logic. Coordinator orchestrates pipeline, workers are team-worker agents. Triggers on "team issue".
Unified team skill for tech debt identification and remediation. Scans codebase for tech debt, assesses severity, plans and executes fixes with validation. Uses team-worker agent architecture with roles/ for domain logic. Coordinator orchestrates pipeline, workers are team-worker agents. Triggers on "team tech debt".
Deep collaborative analysis team skill. All roles route via this SKILL.md. Beat model is coordinator-only (monitor.md). Structure is roles/ + specs/. Triggers on "team ultra-analyze", "team analyze".
Creates a Flyway Java-based migration for schema changes. Handles table creation, column additions, tenant isolation, and ES reindex. Use when asked to modify the database schema.
Use when the user wants to create a dataset, generate synthetic data, or build a data generation pipeline.
Production readiness checklist for durable streams. Switch from dev server to Caddy binary, configure CDN caching with offset-based URLs, Cache-Control and ETag headers, Stream-Cursor for cache collision prevention, TTL and Stream-Expires-At for stream lifecycle, HTTPS requirement, request collapsing for fan-out, CORS configuration. Load before deploying durable streams to production.
Writing data to durable streams. DurableStream.create() with contentType, DurableStream.append() for simple writes, IdempotentProducer for high-throughput exactly-once delivery with autoClaim, fire-and-forget append(), flush(), close(), StaleEpochError handling, JSON mode vs byte stream mode, stream closure. Load when writing, producing, or appending data to a durable stream.
Defining typed state schemas for @durable-streams/state. createStateSchema() with CollectionDefinition (schema, type, primaryKey), Standard Schema validators (Zod, Valibot, ArkType), event helpers insert/update/delete/upsert, ChangeEvent and ControlEvent types, State Protocol operations, transaction IDs (txid) for write confirmation. Load when defining entity types, choosing a schema validator, or creating typed change events.
Stream-backed reactive database with @durable-streams/state. createStreamDB() with schema and stream options, db.preload() lazy initialization, db.collections for TanStack DB collections, optimistic actions with onMutate and mutationFn, db.utils.awaitTxId() for transaction confirmation, control events (snapshot-start, snapshot-end, reset), db.close() cleanup, re-exported TanStack DB operators (eq, gt, and, or, count, sum, avg, min, max).
Yjs CRDT sync over durable streams with @durable-streams/y-durable-streams. DurableStreamsProvider setup, document stream and awareness stream config, transport modes (SSE vs long-poll), provider lifecycle (connect, disconnect, destroy), synced/status/error events, lib0 VarUint8Array framing, awareness heartbeat. Requires yjs, y-protocols, lib0 peer dependencies. Load when integrating Yjs collaborative editing with durable streams.
自动化数据探索和可视化工具,提供从数据加载到专业报告生成的完整EDA解决方案。支持多种图表类型、智能数据诊断、建模评估和HTML报告生成。适用于医疗、金融、电商等领域的数据分析项目。
Data Quality Checker - Auto-activating skill for Data Pipelines. Triggers on: data quality checker, data quality checker Part of the Data Pipelines skill category.
Fast in-memory DataFrame library for datasets that fit in RAM. Use when pandas is too slow but data still fits in memory. Lazy evaluation, parallel execution, Apache Arrow backend. Best for 1-100GB datasets, ETL pipelines, faster pandas replacement. For larger-than-RAM data use dask or vaex.
Use this skill for processing and analyzing large tabular datasets (billions of rows) that exceed available RAM. Vaex excels at out-of-core DataFrame operations, lazy evaluation, fast aggregations, efficient visualization of big data, and machine learning on large datasets. Apply when users need to work with large CSV/HDF5/Arrow/Parquet files, perform fast statistics on massive datasets, create visualizations of big data, or build ML pipelines that do not fit in memory.
Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.
Step-by-step guide for creating Temporal workflows in Dust. Use when adding background jobs, async processing, durable workflows, or task queues.
Packer orchestration: init/build/validate/inspect/output, machine image building, template management, source management
Create Databricks AI/BI dashboards. Use when creating, updating, or deploying Lakeview dashboards. CRITICAL: You MUST test ALL SQL queries via execute_sql BEFORE deploying. Follow guidelines strictly.
Manage Databricks workspace connections: check current workspace, switch profiles, list available workspaces, or authenticate to a new workspace. Use when the user mentions "switch workspace", "which workspace", "current profile", "databrickscfg", "connect to workspace", or "databricks auth".
Databricks SQL (DBSQL) advanced features and SQL warehouse capabilities. This skill MUST be invoked when the user mentions: "DBSQL", "Databricks SQL", "SQL warehouse", "SQL scripting", "stored procedure", "CALL procedure", "materialized view", "CREATE MATERIALIZED VIEW", "pipe syntax", "|>", "geospatial", "H3", "ST_", "spatial SQL", "collation", "COLLATE", "ai_query", "ai_classify", "ai_extract", "ai_gen", "AI function", "http_request", "remote_query", "read_files", "Lakehouse Federation", "recursive CTE", "WITH RECURSIVE", "multi-statement transaction", "temp table", "temporary view", "pipe operator". SHOULD also invoke when the user asks about SQL best practices, data modeling patterns, or advanced SQL features on Databricks.