skills.homescapability registry 搜尋

home/categories/data-engineering

category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 個技能all categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-engineering

1K

supabase-data-handling

Implement Supabase PII handling, data retention, and GDPR/CCPA compliance patterns. Use when handling sensitive data, implementing data redaction, configuring retention policies, or ensuring compliance with privacy regulations for Supabase integrations. Trigger with phrases like "supabase data", "supabase PII", "supabase GDPR", "supabase data retention", "supabase privacy", "supabase CCPA".

jeremylongshore

data-ai

data-engineering

1K

sql-transform-helper

Sql Transform Helper - Auto-activating skill for Data Pipelines. Triggers on: sql transform helper, sql transform helper Part of the Data Pipelines skill category.

jeremylongshore

data-ai

data-engineering

987

typescript-bun-drizzle-quality

Build or review Bun fullstack TypeScript code with Drizzle-backed SQL. Use for backend or cross-layer changes touching API/domain logic, schema or query design, migrations, runtime/type debugging, and boundary validation between contracts, business rules, and persistence.

databuddy-analytics

data-ai

data-engineering

972

analyze-spec

Socratic deep-interview analysis of a spec file to ensure zero ambiguity before implementation

a16z

data-ai

data-engineering

971

batch

Research and plan a large-scale change, then execute it in parallel across 5-30 isolated worktree agents that each open a PR. Use when the user wants to make a sweeping, mechanical change across many files (migrations, refactors, bulk renames) that can be decomposed into independent parallel units.

remorses

data-ai

data-engineering

953

intelligence-network-espionage

Use when building covert informant networks to gather intelligence on rival states. Covers agent placement, secure communication channels, and intelligence verification for strategic advantage.

baojie

data-ai

data-engineering

950

dnanexus-integration

DNAnexus cloud genomics platform. Build apps/applets, manage data (upload/download), dxpy Python SDK, run workflows, FASTQ/BAM/VCF, for genomics pipeline development and execution.

wu-yc

data-ai

data-engineering

950

lamindb

This skill should be used when working with LaminDB, an open-source data framework for biology that makes data queryable, traceable, reproducible, and FAIR. Use when managing biological datasets (scRNA-seq, spatial, flow cytometry, etc.), tracking computational workflows, curating and validating data with biological ontologies, building data lakehouses, or ensuring data lineage and reproducibility in biological research. Covers data management, annotation, ontologies (genes, cell types, diseases, tissues), schema validation, integrations with workflow managers (Nextflow, Snakemake) and MLOps platforms (W&B, MLflow), and deployment strategies.

wu-yc

data-ai

data-engineering

950

dask

Distributed computing for larger-than-RAM pandas/NumPy workflows. Use when you need to scale existing pandas/NumPy code beyond memory or across clusters. Best for parallel file processing, distributed ML, integration with existing pandas code. For out-of-core analytics on single machine use vaex; for in-memory speed use polars.

wu-yc

data-ai

data-engineering

950

export-experiment-data-to-excel

Exports any structured experimental data (JSON, tables, time series) to well-formatted Excel (.xlsx) files. Auto-names sheets (Raw Data, Growth Curves, Cell Counts, etc.), adds unit headers and annotation rows, applies consistent styling, and produces lab-ready spreadsheets for sharing, archival, or downstream analysis in R, pandas, or Excel.

wu-yc

data-ai

data-engineering

950

polars

Fast in-memory DataFrame library for datasets that fit in RAM. Use when pandas is too slow but data still fits in memory. Lazy evaluation, parallel execution, Apache Arrow backend. Best for 1-100GB datasets, ETL pipelines, faster pandas replacement. For larger-than-RAM data use dask or vaex.

wu-yc

data-ai

data-engineering

950

vaex

Use this skill for processing and analyzing large tabular datasets (billions of rows) that exceed available RAM. Vaex excels at out-of-core DataFrame operations, lazy evaluation, fast aggregations, efficient visualization of big data, and machine learning on large datasets. Apply when users need to work with large CSV/HDF5/Arrow/Parquet files, perform fast statistics on massive datasets, create visualizations of big data, or build ML pipelines that do not fit in memory.

wu-yc

data-ai

data-engineering

950

zarr-python

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

wu-yc

data-ai

data-engineering

950

opentargets-database

Query Open Targets Platform for target-disease associations, drug target discovery, tractability/safety data, genetics/omics evidence, known drugs, for therapeutic target identification.

wu-yc

data-ai

data-engineering

946

jax-skills

High-performance numerical computing and machine learning workflows using JAX. Supports array operations, automatic differentiation, JIT compilation, RNN-style scans, map/reduce operations, and gradient computations. Ideal for scientific computing, ML models, and dynamic array transformations.

benchflow-ai

data-ai

data-engineering

946

erlang-otp-behaviors

Use when oTP behaviors including gen_server for stateful processes, gen_statem for state machines, supervisors for fault tolerance, gen_event for event handling, and building robust, production-ready Erlang applications with proven patterns.

benchflow-ai

data-ai

data-engineering

946

erlang-distribution

Use when erlang distributed systems including node connectivity, distributed processes, global name registration, distributed supervision, network partitions, and building fault-tolerant multi-node applications on the BEAM VM.

benchflow-ai

data-ai

data-engineering

946

usgs-data-download

Download water level data from USGS using the dataretrieval package. Use when accessing real-time or historical streamflow data, downloading gage height or discharge measurements, or working with USGS station IDs.

benchflow-ai

data-ai

data-engineering

946

senior-data-engineer

World-class data engineering skill for building scalable data pipelines, ETL/ELT systems, real-time streaming, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka, Flink, Kinesis, and modern data stack. Includes data modeling, pipeline orchestration, data quality, streaming quality monitoring, and DataOps. Use when designing data architectures, building batch or streaming data pipelines, optimizing data workflows, or implementing data governance.

benchflow-ai

data-ai

data-engineering

946

parallel-processing

Parallel processing with joblib for grid search and batch computations. Use when speeding up computationally intensive tasks across multiple CPU cores.

benchflow-ai

data-ai

data-engineering

946

workload-balancing

Optimize workload distribution across workers, processes, or nodes for efficient parallel execution. Use when asked to balance work distribution, improve parallel efficiency, reduce stragglers, implement load balancing, or optimize task scheduling. Covers static/dynamic partitioning, work stealing, and adaptive load balancing strategies.

benchflow-ai

data-ai

data-engineering

946

data-cleaning

Clean messy tabular datasets with deduplication, missing value imputation, outlier handling, and text processing. Use when dealing with dirty data that has duplicates, nulls, or inconsistent formatting.

benchflow-ai

data-ai

data-engineering

923

memory

Persist important outcomes from this step to long-term storage.

tsinghua-fib-lab

data-ai

data-engineering

917

dataset-manager

Use this skill to generate benchmark datasets (TPC-H, TPC-DS, etc.). Trigger when the user needs test data at a specific scale factor for benchmarking or testing. Supports parquet and duckdb output formats.

sirius-db

data-ai

Page 15 / 65