home/categories/data-ai

domain cluster

Data & AI

Machine learning, LLMs, and data processing.

9743 스킬all categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-engineering

143

deal-review

Use to run structured opportunity inspections that align pipeline data with buyer reality.

gtmagents

data-ai

open

data-engineering

143

comp-mechanics

Use to assemble rate tables, accelerator logic, and plan governance templates.

gtmagents

data-ai

open

data-engineering

143

Golden dataset lifecycle patterns for curation, versioning, quality validation, and CI integration. Use when building evaluation datasets, managing dataset versions, validating quality scores, or integrating golden tests into pipelines.

yonatangross

data-ai

open

data-engineering

143

ct-epic-architect

Epic planning and task decomposition for breaking down large initiatives into atomic, executable tasks. Provides dependency analysis, wave-based parallel execution planning, hierarchy management, and research linking. Use when creating epics, decomposing initiatives into task trees, planning parallel workflows, or analyzing task dependencies. Triggers on epic creation, task decomposition requests, or planning phase work.

kryptobaseddev

data-ai

open

data-engineering

143

migrate-dataset

Migrate a legacy OWID dataset (no catalogPath) into the ETL pipeline. Use when user wants to migrate, backport, or convert a legacy dataset by ID, or mentions datasets without catalogPath.

owid

data-ai

open

data-engineering

143

create-multidim

Create multi-dimensional (multidim/MDIM) chart configurations in the OWID ETL pipeline. Use this skill when the user wants to create a new multidim, build a multi-dimensional chart, combine multiple charts into one with dimension toggles, or mentions 'multidim' or 'MDIM'.

owid

data-ai

open

data-engineering

143

ct-orchestrator

Pipeline-aware orchestration skill for managing complex workflows through subagent delegation. Use when the user asks to "orchestrate", "orchestrator mode", "run as orchestrator", "delegate to subagents", "coordinate agents", "spawn subagents", "multi-agent workflow", "context-protected workflow", "agent farm", "HITL orchestration", "pipeline management", or needs to manage complex workflows by delegating work to subagents while protecting the main context window. Enforces ORC-001 through ORC-009 constraints. Provider-neutral — works with any AI agent runtime.

kryptobaseddev

data-ai

open

data-engineering

143

ct-adr-recorder

Records Architecture Decision Records from accepted consensus verdicts. Use when promoting a consensus outcome to a formal ADR: drafts the document in the proposed-then-accepted HITL lifecycle, links to the originating consensus manifest, persists the decision to the canonical SQLite decisions table, and triggers downstream invalidation when an accepted ADR is later superseded. Triggers on phrases like 'write ADR', 'record architecture decision', 'formalize this decision', 'lock in the choice', 'create ADR-XXX', or when a consensus task reaches completed status and needs formalization.

kryptobaseddev

data-ai

open

data-engineering

142

kql-query-authoring

Use this skill when asked to write, create, or help with KQL (Kusto Query Language) queries for Microsoft Sentinel, Defender XDR, or Azure Data Explorer. Triggers on keywords like "write KQL", "create KQL query", "help with KQL", "query [table]", "KQL for [scenario]", or when a user requests queries for specific data analysis scenarios. This skill uses schema validation, Microsoft Learn documentation, and community examples to generate production-ready KQL queries.

SCStelz

data-ai

open

data-analysis

142

heatmap-visualization

Use this skill when asked to create heatmaps, visualize patterns over time, show activity grids, or display aggregated data in a matrix format. Triggers on keywords like "heatmap", "show heatmap", "visualize patterns", "activity grid", "time-based visualization", or when analyzing attack patterns, sign-in activity, or event distributions by time period.

SCStelz

data-ai

open

data-analysis

142

geomap-visualization

Use this skill when asked to create geographic maps, visualize attack origins on a world map, show location-based data, or display IP geolocation. Triggers on keywords like "geomap", "world map", "geographic", "attack map", "show on map", "visualize locations", "attack origins", or when analyzing data with latitude/longitude coordinates.

SCStelz

data-ai

open

data-engineering

141

archaeology

Transform narratives into a queryable decision graph

notactuallytreyanastasio

data-ai

open

data-analysis

141

learning-analytics-interpretation-guide

Interpret learning analytics data and translate dashboard findings into actionable teaching decisions. Use when reviewing LMS data, quiz patterns, or engagement metrics.

GarethManning

data-ai

open

data-engineering

140

csv-parser

Parse and analyze CSV files with data validation

maxvaega

data-ai

open

data-engineering

139

gke-reliability

Workflows for ensuring high availability and reliability of GKE workloads.

GoogleCloudPlatform

data-ai

open

machine-learning

138

tinker

Fine-tune LLMs using the Tinker API. Covers supervised fine-tuning, reinforcement learning, LoRA training, vision-language models, and both high-level Cookbook patterns and low-level API usage.

sundial-org

data-ai

open

llm-ai

138

codex

Run OpenAI's Codex CLI agent in non-interactive mode using `codex exec`. Use when delegating coding tasks to Codex, running Codex in scripts/automation, or when needing a second agent to work on a task in parallel.

sundial-org

data-ai

open

machine-learning

138

tinker-training-cost

Calculate training costs for Tinker fine-tuning jobs. Use when estimating costs for Tinker LLM training, counting tokens in datasets, or comparing Tinker model training prices. Tokenizes datasets using the correct model tokenizer and provides accurate cost estimates.

sundial-org

data-ai

open

machine-learning

138

training-data-curation

Guidelines for creating high-quality datasets for LLM post-training (SFT/DPO/RLHF). Use when preparing data for fine-tuning, evaluating data quality, or designing data collection strategies.

sundial-org

data-ai

open

data-engineering

138

create-boss

Distill a real boss into an AI skill, or generate a boss skill from a famous entrepreneur archetype such as Elon Musk, Steve Jobs, Jeff Bezos, or Jensen Huang. Use when the user wants boss analysis, managing-up guidance, persona extraction, or entrepreneur-style boss presets.

vogtsw

data-ai

open

data-engineering

138

tech-evaluator

评估技术栈选项，使用加权决策矩阵和 ATAM 方法论产出架构决策记录 (ADR)。

Haaaiawd

data-ai

open

data-engineering

137

stash-dynamodb

Integrate CipherStash encryption with Amazon DynamoDB using @cipherstash/stack/dynamodb. Covers the encryptedDynamoDB helper for encrypting items before PutItem and decrypting after GetItem, bulk encrypt/decrypt for BatchWrite and BatchGet, querying with encrypted partition and sort keys via HMAC attributes, nested object encryption, audit logging, and the DynamoDB attribute naming conventions (__source/__hmac). Use when adding encryption to a DynamoDB project, encrypting items before writes, decrypting items after reads, or querying encrypted DynamoDB attributes.

cipherstash

data-ai

open

data-engineering

137

stash-drizzle

Integrate CipherStash encryption with Drizzle ORM using @cipherstash/stack/drizzle. Covers the encryptedType column type, encrypted query operators (eq, like, ilike, gt/gte/lt/lte, between, inArray, asc/desc), schema extraction, batched and/or conditions, EQL migration generation, and the complete Drizzle integration workflow. Use when adding encryption to a Drizzle ORM project, defining encrypted Drizzle schemas, or querying encrypted columns with Drizzle.

cipherstash

data-ai

open

data-engineering

136

ddia-principles

Designing Data-Intensive Applications (DDIA) distilled reference guide by Martin Kleppmann. MUST be loaded when: designing database schemas, choosing storage engines, implementing replication or partitioning, handling distributed transactions, building batch/stream processing pipelines, choosing consistency models, implementing consensus, designing data flow architectures, evaluating trade-offs between availability and consistency, encoding/serialization decisions, data modeling (relational vs document vs graph), building fault-tolerant systems, or any system design and architecture discussion involving data-intensive applications. Trigger on: database design, replication, partitioning, sharding, transactions, isolation levels, consistency, consensus, CAP theorem, batch processing, stream processing, MapReduce, Kafka, event sourcing, CDC, OLTP, OLAP, B-tree, LSM-tree, data warehouse, schema evolution, encoding formats, distributed systems, fault tolerance, leader election, quorum.

luoling8192

data-ai

open

Page 181 / 406