home/categories/data-engineering
category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 اسکلزall categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
data-engineering
138

create-boss

Distill a real boss into an AI skill, or generate a boss skill from a famous entrepreneur archetype such as Elon Musk, Steve Jobs, Jeff Bezos, or Jensen Huang. Use when the user wants boss analysis, managing-up guidance, persona extraction, or entrepreneur-style boss presets.

vogtsw
vogtsw
data-ai
open
data-engineering
138

tech-evaluator

评估技术栈选项,使用加权决策矩阵和 ATAM 方法论产出架构决策记录 (ADR)。

Haaaiawd
Haaaiawd
data-ai
open
data-engineering
137

stash-dynamodb

Integrate CipherStash encryption with Amazon DynamoDB using @cipherstash/stack/dynamodb. Covers the encryptedDynamoDB helper for encrypting items before PutItem and decrypting after GetItem, bulk encrypt/decrypt for BatchWrite and BatchGet, querying with encrypted partition and sort keys via HMAC attributes, nested object encryption, audit logging, and the DynamoDB attribute naming conventions (__source/__hmac). Use when adding encryption to a DynamoDB project, encrypting items before writes, decrypting items after reads, or querying encrypted DynamoDB attributes.

cipherstash
cipherstash
data-ai
open
data-engineering
137

stash-drizzle

Integrate CipherStash encryption with Drizzle ORM using @cipherstash/stack/drizzle. Covers the encryptedType column type, encrypted query operators (eq, like, ilike, gt/gte/lt/lte, between, inArray, asc/desc), schema extraction, batched and/or conditions, EQL migration generation, and the complete Drizzle integration workflow. Use when adding encryption to a Drizzle ORM project, defining encrypted Drizzle schemas, or querying encrypted columns with Drizzle.

cipherstash
cipherstash
data-ai
open
data-engineering
136

ddia-principles

Designing Data-Intensive Applications (DDIA) distilled reference guide by Martin Kleppmann. MUST be loaded when: designing database schemas, choosing storage engines, implementing replication or partitioning, handling distributed transactions, building batch/stream processing pipelines, choosing consistency models, implementing consensus, designing data flow architectures, evaluating trade-offs between availability and consistency, encoding/serialization decisions, data modeling (relational vs document vs graph), building fault-tolerant systems, or any system design and architecture discussion involving data-intensive applications. Trigger on: database design, replication, partitioning, sharding, transactions, isolation levels, consistency, consensus, CAP theorem, batch processing, stream processing, MapReduce, Kafka, event sourcing, CDC, OLTP, OLAP, B-tree, LSM-tree, data warehouse, schema evolution, encoding formats, distributed systems, fault tolerance, leader election, quorum.

luoling8192
luoling8192
data-ai
open
data-engineering
132

java-streams-api

Use when Java Streams API for functional-style data processing. Use when processing collections with streams.

TheBushidoCollective
TheBushidoCollective
data-ai
open
data-engineering
132

ecto-changesets

Use when validating and casting data with Ecto changesets including field validation, constraints, nested changesets, and data transformation. Use for ensuring data integrity before database operations.

TheBushidoCollective
TheBushidoCollective
data-ai
open
data-engineering
132

graphql-performance

Use when optimizing GraphQL API performance with query complexity analysis, batching, caching strategies, depth limiting, monitoring, and database optimization.

TheBushidoCollective
TheBushidoCollective
data-ai
open
data-engineering
132

scala-collections

Use when scala collections including immutable/mutable variants, List, Vector, Set, Map operations, collection transformations, lazy evaluation with views, parallel collections, and custom collection builders for efficient data processing.

TheBushidoCollective
TheBushidoCollective
data-ai
open
data-engineering
132

python-data-classes

Use when Python data modeling with dataclasses, attrs, and Pydantic. Use when creating data structures and models.

TheBushidoCollective
TheBushidoCollective
data-ai
open
data-engineering
130

data-journalism

Data journalism workflows for analysis, visualization, and storytelling. Use when analyzing datasets, creating charts and maps, cleaning messy data, calculating statistics or building data-driven stories. Essential for reporters, newsrooms and researchers working with quantitative information.

jamditis
jamditis
data-ai
open
data-engineering
130

python-pipeline

Python data processing pipelines with modular architecture. Use when building content processing workflows, implementing dispatcher patterns, integrating Google Sheets/Drive APIs, or creating batch processing systems. Covers patterns from rosen-scraper, image-analyzer, and social-scraper projects.

jamditis
jamditis
data-ai
open
data-engineering
129

molecular-docking-pipeline

Molecular Docking Pipeline - Complete docking workflow: retrieve protein structure, predict binding pockets, prepare receptor, and dock ligand. Use this skill for structural biology tasks involving retrieve protein data by pdbcode run fpocket convert pdb to pdbqt dock quick molecule docking. Combines 4 tools from 2 SCP server(s).

InternScience
InternScience
data-ai
open
data-engineering
129

spectre

Run Cadence Spectre simulations remotely via virtuoso-bridge: upload netlists, execute, parse PSF results. TRIGGER when the user wants to run a SPICE/Spectre simulation from a netlist file, do transient/AC/PSS/pnoise analysis outside Virtuoso GUI, parse PSF waveform data, run multiple simulations in parallel across one or more servers, check simulation job status, or mentions Spectre APS/AXS modes. Also triggers for sim-jobs, sim-cancel, or parallel/concurrent simulation requests. Use this for standalone netlist-driven simulation — for GUI-based ADE Maestro simulation, use the virtuoso skill instead.

Arcadia-1
Arcadia-1
data-ai
open
data-engineering
129

v5-migration

The olm:v5 migration script, UpdateTagsService, batch processing, migrated_at, ClassifyTagsService deprecated mappings, and data migration from v4 category tables.

OpenLitterMap
OpenLitterMap
data-ai
open
data-engineering
128

tanstack-integration-best-practices

Best practices for integrating TanStack Query with TanStack Router and TanStack Start. Patterns for full-stack data flow, SSR, and caching coordination.

DeckardGer
DeckardGer
data-ai
open
data-engineering
128

synthetic-data-generation

Generate synthetic data using sdg_hub with composable blocks and YAML flows. Use when the user wants to create training datasets, generate QA pairs, run data generation pipelines, build custom flows, produce synthetic data from documents, use agent frameworks for data generation, or distill MCP tool-use traces. Supports pre-built flows, custom Python scripts, and YAML flow authoring with 20+ blocks, agent connectors (Langflow, LangGraph), MCP tool-use, and 100+ LLM providers via LiteLLM.

Red-Hat-AI-Innovation-Team
Red-Hat-AI-Innovation-Team
data-ai
open
data-engineering
125

openspec-sync-specs

Sync delta specs from a change to main specs. Use when the user wants to update main specs with changes from a delta spec, without archiving the change.

aehrc
aehrc
data-ai
open
data-engineering
124

pytdc

Therapeutics Data Commons (PyTDC) for AI-ready therapeutic ML datasets and benchmarks; use it when you need standardized dataset loading, meaningful splits (e.g., scaffold/cold-start), and consistent evaluation for ADME/Toxicity/DTI/DDI or molecular optimization.

aipoch
aipoch
data-ai
open
data-engineering
124

pdb-database

Access the RCSB Protein Data Bank (PDB) to search, download, and programmatically retrieve 3D macromolecular structures and metadata; use when you need structure discovery (text/sequence/3D similarity) or automated structural data ingestion for structural biology and drug discovery workflows.

aipoch
aipoch
data-ai
open
data-engineering
124

grant-gantt-chart-gen

Use grant gantt chart gen for evidence insight workflows that need structured execution, explicit assumptions, and clear output boundaries.

aipoch
aipoch
data-ai
open
data-engineering
124

table-1-generator

Automated generation of baseline characteristics tables (Table 1) for clinical research papers.

aipoch
aipoch
data-ai
open
data-engineering
124

digital-twin-discharge-drafter

Use when drafting patient discharge summaries, creating personalized discharge instructions, simulating post-discharge outcomes, reducing hospital readmissions, or optimizing care transitions. Generates AI-enhanced discharge documentation with digital twin predictions for improved patient safety.

aipoch
aipoch
data-ai
open
Previous
Page 43 / 65
Next