skills.homescapability registry 검색

home/categories/data-engineering

category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541 스킬all categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-engineering

18.1K

dask

Distributed computing for larger-than-RAM pandas/NumPy workflows. Use when you need to scale existing pandas/NumPy code beyond memory or across clusters. Best for parallel file processing, distributed ML, integration with existing pandas code. For out-of-core analytics on single machine use vaex; for in-memory speed use polars.

K-Dense-AI

data-ai

data-engineering

18.1K

dnanexus-integration

DNAnexus cloud genomics platform. Build apps/applets, manage data (upload/download), dxpy Python SDK, run workflows, FASTQ/BAM/VCF, for genomics pipeline development and execution.

K-Dense-AI

data-ai

data-engineering

18.1K

lamindb

This skill should be used when working with LaminDB, an open-source data framework for biology that makes data queryable, traceable, reproducible, and FAIR. Use when managing biological datasets (scRNA-seq, spatial, flow cytometry, etc.), tracking computational workflows, curating and validating data with biological ontologies, building data lakehouses, or ensuring data lineage and reproducibility in biological research. Covers data management, annotation, ontologies (genes, cell types, diseases, tissues), schema validation, integrations with workflow managers (Nextflow, Snakemake) and MLOps platforms (W&B, MLflow), and deployment strategies.

K-Dense-AI

data-ai

data-engineering

18.1K

polars-bio

High-performance genomic interval operations and bioinformatics file I/O on Polars DataFrames. Overlap, nearest, merge, coverage, complement, subtract for BED/VCF/BAM/GFF intervals. Streaming, cloud-native, faster bioframe alternative.

K-Dense-AI

data-ai

data-engineering

18.1K

polars

Fast in-memory DataFrame library for datasets that fit in RAM. Use when pandas is too slow but data still fits in memory. Lazy evaluation, parallel execution, Apache Arrow backend. Best for 1-100GB datasets, ETL pipelines, faster pandas replacement. For larger-than-RAM data use dask or vaex.

K-Dense-AI

data-ai

data-engineering

18.1K

vaex

Use this skill for processing and analyzing large tabular datasets (billions of rows) that exceed available RAM. Vaex excels at out-of-core DataFrame operations, lazy evaluation, fast aggregations, efficient visualization of big data, and machine learning on large datasets. Apply when users need to work with large CSV/HDF5/Arrow/Parquet files, perform fast statistics on massive datasets, create visualizations of big data, or build ML pipelines that do not fit in memory.

K-Dense-AI

data-ai

data-engineering

18.1K

zarr-python

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

K-Dense-AI

data-ai

data-engineering

17.6K

vaex

Use this skill for processing and analyzing large tabular datasets (billions of rows) that exceed available RAM. Vaex excels at out-of-core DataFrame operations, lazy evaluation, fast aggregations, efficient visualization of big data, and machine learning on large datasets. Apply when users need to work with large CSV/HDF5/Arrow/Parquet files, perform fast statistics on massive datasets, create visualizations of big data, or build ML pipelines that don't fit in memory.

davila7

data-ai

data-engineering

17.6K

polars

Fast DataFrame library (Apache Arrow). Select, filter, group_by, joins, lazy evaluation, CSV/Parquet I/O, expression API, for high-performance data analysis workflows.

davila7

data-ai

data-engineering

17.6K

senior-data-engineer

World-class data engineering skill for building scalable data pipelines, ETL/ELT systems, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka, and modern data stack. Includes data modeling, pipeline orchestration, data quality, and DataOps. Use when designing data architectures, building data pipelines, optimizing data workflows, or implementing data governance.

davila7

data-ai

data-engineering

17.6K

zarr-python

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

davila7

data-ai

data-engineering

16.5K

data-pipeline

Data pipeline expert for ETL, Apache Spark, Airflow, dbt, and data quality

RightNow-AI

data-ai

data-engineering

16.5K

docker

Docker expert for containers, Compose, Dockerfiles, and debugging

RightNow-AI

data-ai

data-engineering

16.2K

write-script-bigquery

MUST use when writing BigQuery queries.

windmill-labs

data-ai

data-engineering

16.2K

write-script-snowflake

MUST use when writing Snowflake queries.

windmill-labs

data-ai

data-engineering

16.1K

major-task

Work heavyweight framework or library tasks with planning-first research, selective deep analysis, and rigorous handoff

udecode

data-ai

data-engineering

14.6K

cosmos-provider

Implementation details for the EF Core Azure Cosmos DB provider. Use when changing Cosmos-specific code.

dotnet

data-ai

data-engineering

14.2K

abp-ef-core

ABP Entity Framework Core - DbContext, entity configuration, EfCoreRepository implementation, migrations (dotnet ef migrations add), data seeding. Use when working in EntityFrameworkCore projects, adding migrations, or implementing EF Core repositories.

abpframework

data-ai

data-engineering

10.9K

data-loading

Optimize data loading pipeline to prevent GPU starvation. Use when setting up DataLoader or data preprocessing.

aiming-lab

data-ai

data-engineering

10.4K

status

Show DAG state, agent progress, and branch status for an AgentHub session.

alirezarezvani

data-ai

data-engineering

10.4K

ci-cd-pipeline-builder

CI/CD Pipeline Builder

alirezarezvani

data-ai

data-engineering

10.4K

database-designer

Use when the user asks to design database schemas, plan data migrations, optimize queries, choose between SQL and NoSQL, or model data relationships.

alirezarezvani

data-ai

data-engineering

10.4K

senior-data-engineer

Data engineering skill for building scalable data pipelines, ETL/ELT systems, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka, and modern data stack. Includes data modeling, pipeline orchestration, data quality, and DataOps. Use when designing data architectures, building data pipelines, optimizing data workflows, implementing data governance, or troubleshooting data issues.

alirezarezvani

data-ai

data-engineering

10.4K

snowflake-development

Use when writing Snowflake SQL, building data pipelines with Dynamic Tables or Streams/Tasks, using Cortex AI functions, creating Cortex Agents, writing Snowpark Python, configuring dbt for Snowflake, or troubleshooting Snowflake errors.

alirezarezvani

data-ai

Page 5 / 65