skills.homescapability registry تلاش

home/categories/data-engineering

category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 اسکلزall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-engineering

138

create-boss

Distill a real boss into an AI skill, or generate a boss skill from a famous entrepreneur archetype such as Elon Musk, Steve Jobs, Jeff Bezos, or Jensen Huang. Use when the user wants boss analysis, managing-up guidance, persona extraction, or entrepreneur-style boss presets.

vogtsw

data-ai

data-engineering

138

tech-evaluator

评估技术栈选项，使用加权决策矩阵和 ATAM 方法论产出架构决策记录 (ADR)。

Haaaiawd

data-ai

data-engineering

137

stash-dynamodb

Integrate CipherStash encryption with Amazon DynamoDB using @cipherstash/stack/dynamodb. Covers the encryptedDynamoDB helper for encrypting items before PutItem and decrypting after GetItem, bulk encrypt/decrypt for BatchWrite and BatchGet, querying with encrypted partition and sort keys via HMAC attributes, nested object encryption, audit logging, and the DynamoDB attribute naming conventions (__source/__hmac). Use when adding encryption to a DynamoDB project, encrypting items before writes, decrypting items after reads, or querying encrypted DynamoDB attributes.

cipherstash

data-ai

data-engineering

137

stash-drizzle

Integrate CipherStash encryption with Drizzle ORM using @cipherstash/stack/drizzle. Covers the encryptedType column type, encrypted query operators (eq, like, ilike, gt/gte/lt/lte, between, inArray, asc/desc), schema extraction, batched and/or conditions, EQL migration generation, and the complete Drizzle integration workflow. Use when adding encryption to a Drizzle ORM project, defining encrypted Drizzle schemas, or querying encrypted columns with Drizzle.

cipherstash

data-ai

data-engineering

136

ddia-principles

Designing Data-Intensive Applications (DDIA) distilled reference guide by Martin Kleppmann. MUST be loaded when: designing database schemas, choosing storage engines, implementing replication or partitioning, handling distributed transactions, building batch/stream processing pipelines, choosing consistency models, implementing consensus, designing data flow architectures, evaluating trade-offs between availability and consistency, encoding/serialization decisions, data modeling (relational vs document vs graph), building fault-tolerant systems, or any system design and architecture discussion involving data-intensive applications. Trigger on: database design, replication, partitioning, sharding, transactions, isolation levels, consistency, consensus, CAP theorem, batch processing, stream processing, MapReduce, Kafka, event sourcing, CDC, OLTP, OLAP, B-tree, LSM-tree, data warehouse, schema evolution, encoding formats, distributed systems, fault tolerance, leader election, quorum.

luoling8192

data-ai

data-engineering

132

java-streams-api

Use when Java Streams API for functional-style data processing. Use when processing collections with streams.

TheBushidoCollective

data-ai

data-engineering

132

ecto-changesets

Use when validating and casting data with Ecto changesets including field validation, constraints, nested changesets, and data transformation. Use for ensuring data integrity before database operations.

TheBushidoCollective

data-ai

data-engineering

132

graphql-performance

Use when optimizing GraphQL API performance with query complexity analysis, batching, caching strategies, depth limiting, monitoring, and database optimization.

TheBushidoCollective

data-ai

data-engineering

132

scala-collections

Use when scala collections including immutable/mutable variants, List, Vector, Set, Map operations, collection transformations, lazy evaluation with views, parallel collections, and custom collection builders for efficient data processing.

TheBushidoCollective

data-ai

data-engineering

132

tensorflow-data-pipelines

Create efficient data pipelines with tf.data

TheBushidoCollective

data-ai

data-engineering

132

python-data-classes

Use when Python data modeling with dataclasses, attrs, and Pydantic. Use when creating data structures and models.

TheBushidoCollective

data-ai

data-engineering

130

data-journalism

Data journalism workflows for analysis, visualization, and storytelling. Use when analyzing datasets, creating charts and maps, cleaning messy data, calculating statistics or building data-driven stories. Essential for reporters, newsrooms and researchers working with quantitative information.

jamditis

data-ai

data-engineering

130

python-pipeline

Python data processing pipelines with modular architecture. Use when building content processing workflows, implementing dispatcher patterns, integrating Google Sheets/Drive APIs, or creating batch processing systems. Covers patterns from rosen-scraper, image-analyzer, and social-scraper projects.

jamditis

data-ai

data-engineering

129

molecular-docking-pipeline

Molecular Docking Pipeline - Complete docking workflow: retrieve protein structure, predict binding pockets, prepare receptor, and dock ligand. Use this skill for structural biology tasks involving retrieve protein data by pdbcode run fpocket convert pdb to pdbqt dock quick molecule docking. Combines 4 tools from 2 SCP server(s).

InternScience

data-ai

data-engineering

129

spectre

Run Cadence Spectre simulations remotely via virtuoso-bridge: upload netlists, execute, parse PSF results. TRIGGER when the user wants to run a SPICE/Spectre simulation from a netlist file, do transient/AC/PSS/pnoise analysis outside Virtuoso GUI, parse PSF waveform data, run multiple simulations in parallel across one or more servers, check simulation job status, or mentions Spectre APS/AXS modes. Also triggers for sim-jobs, sim-cancel, or parallel/concurrent simulation requests. Use this for standalone netlist-driven simulation — for GUI-based ADE Maestro simulation, use the virtuoso skill instead.

Arcadia-1

data-ai

data-engineering

129

v5-migration

The olm:v5 migration script, UpdateTagsService, batch processing, migrated_at, ClassifyTagsService deprecated mappings, and data migration from v4 category tables.

OpenLitterMap

data-ai

data-engineering

128

tanstack-integration-best-practices

Best practices for integrating TanStack Query with TanStack Router and TanStack Start. Patterns for full-stack data flow, SSR, and caching coordination.

DeckardGer

data-ai

data-engineering

128

synthetic-data-generation

Generate synthetic data using sdg_hub with composable blocks and YAML flows. Use when the user wants to create training datasets, generate QA pairs, run data generation pipelines, build custom flows, produce synthetic data from documents, use agent frameworks for data generation, or distill MCP tool-use traces. Supports pre-built flows, custom Python scripts, and YAML flow authoring with 20+ blocks, agent connectors (Langflow, LangGraph), MCP tool-use, and 100+ LLM providers via LiteLLM.

Red-Hat-AI-Innovation-Team

data-ai

data-engineering

125

openspec-sync-specs

Sync delta specs from a change to main specs. Use when the user wants to update main specs with changes from a delta spec, without archiving the change.

aehrc

data-ai

data-engineering

124

pytdc

Therapeutics Data Commons (PyTDC) for AI-ready therapeutic ML datasets and benchmarks; use it when you need standardized dataset loading, meaningful splits (e.g., scaffold/cold-start), and consistent evaluation for ADME/Toxicity/DTI/DDI or molecular optimization.

aipoch

data-ai

data-engineering

124

pdb-database

Access the RCSB Protein Data Bank (PDB) to search, download, and programmatically retrieve 3D macromolecular structures and metadata; use when you need structure discovery (text/sequence/3D similarity) or automated structural data ingestion for structural biology and drug discovery workflows.

aipoch

data-ai

data-engineering

124

grant-gantt-chart-gen

Use grant gantt chart gen for evidence insight workflows that need structured execution, explicit assumptions, and clear output boundaries.

aipoch

data-ai

data-engineering

124

table-1-generator

Automated generation of baseline characteristics tables (Table 1) for clinical research papers.

aipoch

data-ai

data-engineering

124

digital-twin-discharge-drafter

Use when drafting patient discharge summaries, creating personalized discharge instructions, simulating post-discharge outcomes, reducing hospital readmissions, or optimizing care transitions. Generates AI-enhanced discharge documentation with digital twin predictions for improved patient safety.

aipoch

data-ai

Page 43 / 65