skills.homescapability registry 搜尋

home/categories/data-engineering

category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 個技能all categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-engineering

529

chunking-strategy-guide

Methodology for systematically designing document chunking strategies for RAG pipelines. Use this skill for 'chunking strategy', 'document splitting', 'RAG chunking', 'embedding optimization', 'semantic chunking', 'text splitting', and other RAG data preprocessing tasks. Note: vector DB infrastructure construction and embedding model training are outside the scope of this skill.

revfactory

data-ai

data-engineering

529

bi-dashboard

Full pipeline where an agent team collaborates to generate data warehouse design, KPI definitions, visualizations, and automated reports for a BI dashboard. Use this skill for requests like 'build me a BI dashboard', 'dashboard design', 'define KPIs', 'executive reporting dashboard', 'data visualization design', 'report automation', 'data warehouse design', 'build a KPI tree', 'sales dashboard', 'performance metrics framework', and other BI dashboard construction tasks. Also supports visualization or report automation when existing data models or KPI lists are available. Note: direct manipulation of BI tools (Tableau/PowerBI/Looker), database instance creation, and real-time data pipeline operation are outside the scope of this skill.

revfactory

data-ai

data-engineering

529

adr-writer

A pipeline where an agent team systematically creates Architecture Decision Records (ADRs). Use this skill for requests such as 'write an ADR,' 'architecture decision record,' 'document a technical decision,' 'architecture decision record,' 'organize architecture selection rationale,' 'technology stack decision,' 'alternative comparison analysis,' 'tradeoff analysis,' or 'architecture decision history.' Note: actual code migration execution, infrastructure provisioning, and performance test execution are outside the scope of this skill.

revfactory

data-ai

data-engineering

529

legacy-modernizer

레거시 코드베이스를 현대적 아키텍처로 전환하는 풀 파이프라인. 기술부채 분석, 리팩토링 전략 수립, 코드 마이그레이션, 회귀 테스트를 에이전트 팀이 협업하여 수행한다. '레거시 코드 현대화해줘', '리팩토링 전략 세워줘', '코드 마이그레이션', '기술부채 분석', '레거시 시스템 업그레이드', '프레임워크 전환', '코드 모더나이제이션', '리팩토링 계획' 등 레거시 코드 현대화 전반에 이 스킬을 사용한다. 기존 분석 보고서가 있어도 전략 수립이나 마이그레이션을 지원한다. 단, 실제 프로덕션 배포, CI/CD 파이프라인 실행, 인프라 프로비저닝은 이 스킬의 범위가 아니다.

revfactory

data-ai

data-engineering

529

dag-orchestration-patterns

Airflow DAG 설계 패턴, 의존관계 관리, 재시도 전략, 멱등성 보장, 백필 전략 등 데이터 파이프라인 오케스트레이션 가이드. 'Airflow DAG', 'DAG 설계', '의존관계', '재시도 전략', '멱등성', '백필', '파이프라인 오케스트레이션', 'Dagster', 'Prefect' 등 파이프라인 스케줄링 시 이 스킬을 사용한다. scheduler-engineer의 DAG 설계 역량을 강화한다. 단, 데이터 품질 규칙 정의나 모니터링 대시보드는 이 스킬의 범위가 아니다.

revfactory

data-ai

data-engineering

529

data-pipeline

데이터 파이프라인의 수집, 변환, 적재, 품질검증, 모니터링을 에이전트 팀이 협업하여 설계·구현하는 풀 파이프라인. '데이터 파이프라인 설계해줘', 'ETL 파이프라인 구축', '데이터 수집 자동화', '데이터 웨어하우스 파이프라인', 'ELT 설계', '배치 파이프라인', '스트리밍 파이프라인', 'Airflow DAG 만들어줘', 'dbt 모델 설계', '데이터 품질 검증 체계' 등 데이터 파이프라인 설계·구축 전반에 이 스킬을 사용한다. 기존 파이프라인의 품질 검증이나 모니터링만 필요한 경우에도 지원한다. 단, 실시간 스트리밍 엔진(Flink/Spark Streaming) 직접 실행, 클라우드 인프라 프로비저닝, 데이터베이스 관리자(DBA) 업무는 이 스킬의 범위가 아니다.

revfactory

data-ai

data-engineering

529

data-quality-framework

데이터 품질 차원(정확성, 완전성, 적시성, 일관성 등)별 검증 규칙 설계와 Great Expectations, dbt tests 등의 도구 활용 가이드. '데이터 품질', '검증 규칙', 'Great Expectations', 'dbt test', '데이터 프로파일링', '이상 탐지', '데이터 계약' 등 데이터 품질 관리 시 이 스킬을 사용한다. data-quality-manager의 품질 검증 역량을 강화한다. 단, 파이프라인 스케줄링이나 전체 아키텍처 설계는 이 스킬의 범위가 아니다.

revfactory

data-ai

data-engineering

529

feature-engineering-cookbook

피처 엔지니어링 기법 카탈로그: 수치형/범주형/시계열/텍스트 변환, 피처 선택, 피처 스토어 설계. '피처 엔지니어링', '특성 공학', '변수 변환', '인코딩', '스케일링', '피처 선택', '피처 스토어', '피처 중요도' 등 데이터 전처리 및 피처 설계 시 이 스킬을 사용한다. data-engineer의 피처 엔지니어링 역량을 강화한다. 단, 모델 설계나 학습 관리는 이 스킬의 범위가 아니다.

revfactory

data-ai

data-engineering

529

data-migration

데이터 마이그레이션의 소스 분석, 스키마 매핑, 변환 스크립트 생성, 검증 쿼리 설계, 롤백 계획을 에이전트 팀이 협업하여 수행하는 풀 마이그레이션 파이프라인. '데이터 마이그레이션', 'DB 이관', '데이터 이전', '스키마 변환', '데이터베이스 이관 계획', 'ETL 스크립트', '데이터 이행', 'DB 마이그레이션 검증', '시스템 전환' 등 데이터 마이그레이션 전반에 이 스킬을 사용한다. 단, 실시간 CDC 스트리밍 구축, 클라우드 인프라 프로비저닝, 애플리케이션 코드 마이그레이션은 이 스킬의 범위가 아니다.

revfactory

data-ai

data-engineering

529

chunking-strategy-guide

RAG 파이프라인의 문서 청킹 전략을 체계적으로 설계하는 방법론. '청킹 전략', '문서 분할', 'RAG 청킹', '임베딩 최적화', '시맨틱 청킹', '텍스트 분할' 등 RAG 데이터 전처리 시 사용한다. 단, 벡터 DB 인프라 구축, 임베딩 모델 학습은 이 스킬의 범위가 아니다.

revfactory

data-ai

data-engineering

527

sqlite-inspector

Проверка консистентности данных в SQLite базах данных MikoPBX после операций REST API. Использовать при валидации результатов API, отладке проблем с данными, проверке связей внешних ключей или инспектировании CDR записей для тестирования.

mikopbx

data-ai

data-engineering

522

firebase-data-connect

Integrates Firebase Data Connect into Flutter apps. Use when setting up Data Connect, designing queries, handling errors, or applying security and performance best practices.

evanca

data-ai

data-engineering

522

riverpod

Uses Riverpod for state management in Flutter/Dart. Use when setting up providers, combining requests, managing state disposal, passing arguments, performing side effects, testing providers, or applying Riverpod best practices.

evanca

data-ai

data-engineering

519

dbt-analyze

Analyze downstream impact of dbt model changes using column-level lineage and the dependency graph. Use when evaluating the blast radius of a change before shipping. Powered by altimate-dbt.

AltimateAI

data-ai

data-engineering

519

dbt-develop

Create and modify dbt models — staging, intermediate, marts, incremental, medallion architecture. Use when building new SQL models, extending existing ones, scaffolding YAML configs, or reorganizing project structure. Powered by altimate-dbt.

AltimateAI

data-ai

data-engineering

519

dbt-docs

Document dbt models and columns in schema.yml with business context — model descriptions, column definitions, and doc blocks. Use when adding or improving documentation for discoverability. Powered by altimate-dbt.

AltimateAI

data-ai

data-engineering

519

dbt-test

Add schema tests, unit tests, and data quality checks to dbt models. Use when validating data integrity, adding test definitions to schema.yml, writing unit tests, or practicing test-driven development in dbt. Powered by altimate-dbt.

AltimateAI

data-ai

data-engineering

519

schema-migration

Analyze DDL migrations for data loss risks — type narrowing, missing defaults, dropped constraints, breaking column changes. Use before applying schema changes to production.

AltimateAI

data-ai

data-engineering

515

batch-research

批量数据采集技能，负责分批并发调度 researcher agent 抓取所有数据源。

miantiao-me

data-ai

data-engineering

512

aws-architecture-diagram

Generate validated AWS architecture diagrams as draw.io XML using official AWS4 icon libraries. Use this skill whenever the user wants to create, generate, or design AWS architecture diagrams, cloud infrastructure diagrams, or system design visuals. Also triggers for requests to visualize existing infrastructure from CloudFormation, CDK, or Terraform code. Supports two modes: analyze an existing codebase to auto-generate diagrams, or brainstorm interactively from scratch. Exports .drawio files with optional PNG/SVG/PDF export via draw.io desktop CLI.

awslabs

data-ai

data-engineering

501

dxos-echo

Guide for ECHO (DXOS object graph / local-first DB). Use when adding or changing queries, filters, schema types, Ref/DXN handling, Database service layers, EchoClient/space DB access, or React ECHO hooks.

dxos

data-ai

data-engineering

492

migration-helper

Analyze GORM model changes, estimate resulting schema (DDL) differences, and propose safe migration steps with verification guidance.

pilinux

data-ai

data-engineering

486

autopilot

Intake-to-delivery pipeline. Processes pending items from .planning/intake/: briefs new ideas, executes approved work through research → plan → build → verify. Drop a file in .planning/intake/ and invoke this skill.

SethGammon

data-ai

data-engineering

486

session-handoff

Synthesizes the current session into a structured HANDOFF block for context transfer between sessions. Captures what was built, decisions made, and unresolved items.

SethGammon

data-ai

Page 23 / 65