home/categories/data-ai

domain cluster

Data & AI

Machine learning, LLMs, and data processing.

9743 skillsall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-engineering

310

tracing-upstream-lineage

Trace upstream data lineage. Use when the user asks where data comes from, what feeds a table, upstream dependencies, data sources, or needs to understand data origins.

astronomer

data-ai

open

data-engineering

310

Trace downstream data lineage and impact analysis. Use when the user asks what depends on this data, what breaks if something changes, downstream dependencies, or needs to assess change risk before modifying a table or DAG.

astronomer

data-ai

open

data-engineering

310

analyzing-data

Queries data warehouse and answers business questions about data. Handles questions requiring database/warehouse queries including "who uses X", "how many Y", "show me Z", "find customers", "what is the count", data lookups, metrics, trends, or SQL analysis.

astronomer

data-ai

open

data-engineering

310

airflow-hitl

Use when the user needs human-in-the-loop workflows in Airflow (approval/reject, form input, or human-driven branching). Covers ApprovalOperator, HITLOperator, HITLBranchOperator, HITLEntryOperator. Requires Airflow 3.1+. Does not cover AI/LLM calls (see airflow-ai).

astronomer

data-ai

open

data-engineering

310

airflow

Queries, manages, and troubleshoots Apache Airflow using the af CLI. Covers listing DAGs, triggering runs, reading task logs, diagnosing failures, debugging DAG import errors, checking connections, variables, pools, and monitoring health. Also routes to sub-skills for writing DAGs, debugging, deploying, and migrating Airflow 2 to 3. Use when user mentions "Airflow", "DAG", "DAG run", "task log", "import error", "parse error", "broken DAG", or asks to "trigger a pipeline", "debug import errors", "check Airflow health", "list connections", "retry a run", or any Airflow operation. Do NOT use for warehouse/SQL analytics on Airflow metadata tables — use analyzing-data instead.

astronomer

data-ai

open

data-engineering

310

annotating-task-lineage

Annotate Airflow tasks with data lineage using inlets and outlets. Use when the user wants to add lineage metadata to tasks, specify input/output datasets, or enable lineage tracking for operators without built-in OpenLineage extraction.

astronomer

data-ai

open

data-engineering

310

authoring-dags

Workflow and best practices for writing Apache Airflow DAGs. Use when the user wants to create a new DAG, write pipeline code, or asks about DAG patterns and conventions. For testing and debugging DAGs, see the testing-dags skill.

astronomer

data-ai

open

data-engineering

310

blueprint

Define reusable Airflow task group templates with Pydantic validation and compose DAGs from YAML. Use when creating blueprint templates, composing DAGs from YAML, validating configurations, or enabling no-code DAG authoring for non-engineers.

astronomer

data-ai

open

data-engineering

310

checking-freshness

Quick data freshness check. Use when the user asks if data is up to date, when a table was last updated, if data is stale, or needs to verify data currency before using it.

astronomer

data-ai

open

data-engineering

310

cosmos-dbt-core

Use when turning a dbt Core project into an Airflow DAG/TaskGroup using Astronomer Cosmos. Does not cover dbt Fusion. Before implementing, verify dbt engine, warehouse, Airflow version, execution environment, DAG vs TaskGroup, and manifest availability.

astronomer

data-ai

open

data-engineering

310

cosmos-dbt-fusion

Use when running a dbt Fusion project with Astronomer Cosmos. Covers Cosmos 1.11+ configuration for Fusion on Snowflake/Databricks with ExecutionMode.LOCAL. Before implementing, verify dbt engine is Fusion (not Core), warehouse is supported, and local execution is acceptable. Does not cover dbt Core.

astronomer

data-ai

open

data-engineering

310

creating-openlineage-extractors

Create custom OpenLineage extractors for Airflow operators. Use when the user needs lineage from unsupported or third-party operators, wants column-level lineage, or needs complex extraction logic beyond what inlets/outlets provide.

astronomer

data-ai

open

data-engineering

310

deploying-airflow

Deploy Airflow DAGs and projects. Use when the user wants to deploy code, push DAGs, set up CI/CD, deploy to production, or asks about deployment strategies for Airflow.

astronomer

data-ai

open

data-engineering

310

managing-astro-local-env

Manage local Airflow environment with Astro CLI (Docker and standalone modes). Use when the user wants to start, stop, or restart Airflow, view logs, query the Airflow API, troubleshoot, or fix environment issues. For project setup, see setting-up-astro-project.

astronomer

data-ai

open

data-engineering

310

migrating-airflow-2-to-3

Guide for migrating Apache Airflow 2.x projects to Airflow 3.x. Use when the user mentions Airflow 3 migration, upgrade, compatibility issues, breaking changes, or wants to modernize their Airflow codebase. If you detect Airflow 2.x code that needs migration, prompt the user and ask if they want you to help upgrade. Always load this skill as the first step for any migration-related request.

astronomer

data-ai

open

data-engineering

310

warehouse-init

Initialize warehouse schema discovery. Generates .astro/warehouse.md with all table metadata for instant lookups. Run once per project, refresh when schema changes. Use when user says "/astronomer-data:warehouse-init" or asks to set up data discovery.

astronomer

data-ai

open

machine-learning

309

sample-scaffolder

This skill is designed to take a skill that has been submitted as a PR and scaffold it into the sample format as an expected standard by the repository.

pnp

data-ai

open

data-engineering

307

dotfile-brainstorm

Use when you want to build a project but don't have a spec yet and need to brainstorm the idea into a design doc structured for pipeline DOT generation. Use when starting from a vague idea, project concept, or feature request that needs to become a headless autonomous build pipeline.

harperreed

data-ai

open

machine-learning

307

perf-check

Run a Maestro-style performance assessment for hotspots, regressions, and optimization planning

josstei

data-ai

open

machine-learning

307

perf-check

Run a Maestro-style performance assessment for hotspots, regressions, and optimization planning

josstei

data-ai

open

data-analysis

305

reporting

Guidelines for formatting reports using HTML details/summary tags

githubnext

data-ai

open

data-analysis

305

proprietary-data-generator

Create original surveys, benchmarks, and aggregated data nobody else has. Automate data collection for content moats. Triggers on: "create original data", "proprietary data", "survey design", "benchmark study", "original research", "data-driven content", "create a survey", "industry benchmark", "aggregated data", "unique data", "first-party data", "data moat", "generate research data", "create a study", "original statistics", "data nobody else has", "competitive data advantage".

Affitor

data-ai

open

data-engineering

305

hipdnn-codegen

Generate hipDNN operation boilerplate from a FlatBuffer schema. Use when the user wants to add a new operation type to hipDNN, or generate descriptor/packer/unpacker code.

ROCm

data-ai

open

llm-ai

304

system-prompt-writer

This skill should be used when writing or improving system prompts for AI agents, providing expert guidance based on Anthropic's context engineering principles.

aws-samples

data-ai

open

Page 142 / 406