home/categories/data-engineering
category focus

Data Eng.

ETL pipelines and big data infrastructure.

1541টি স্কিলall categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
data-engineering
31

apache-spark-data-processing

Complete guide for Apache Spark data processing including RDDs, DataFrames, Spark SQL, streaming, MLlib, and production deployment

manutej
manutej
data-ai
open
data-engineering
31

sparql-university

Guidance for writing SPARQL queries against RDF/Turtle datasets, particularly for university or academic data. This skill should be used when tasks involve querying RDF data with SPARQL, filtering entities based on multiple criteria, aggregating results, or working with Turtle (.ttl) files.

letta-ai
letta-ai
data-ai
open
data-engineering
31

sparql-university

Guidance for writing and verifying SPARQL queries against RDF datasets, particularly university/academic ontologies. This skill should be used when tasks involve querying RDF data with SPARQL, working with academic datasets (students, professors, departments, courses), or performing complex graph pattern matching with filters and aggregations.

letta-ai
letta-ai
data-ai
open
data-engineering
31

kafka-stream-processing

Complete guide for Apache Kafka stream processing including producers, consumers, Kafka Streams, connectors, schema registry, and production deployment

manutej
manutej
data-ai
open
data-engineering
31

multi-source-data-merger

This skill provides guidance for merging data from multiple heterogeneous sources (JSON, CSV, Parquet, XML, etc.) into a unified dataset. Use this skill when tasks involve combining records from different file formats, applying field mappings, resolving conflicts based on priority rules, or generating merged outputs with conflict reports. Applicable to ETL pipelines, data consolidation, and record deduplication scenarios.

letta-ai
letta-ai
data-ai
open
data-engineering
31

reshard-c4-data

Guidance for data resharding tasks that involve reorganizing files across directory structures with constraints on file sizes and directory contents. This skill applies when redistributing datasets, splitting large files, or reorganizing data into shards while maintaining constraints like maximum files per directory or maximum file sizes. Use when tasks involve resharding, data partitioning, or directory-constrained file reorganization.

letta-ai
letta-ai
data-ai
open
data-engineering
31

dbt-data-transformation

Complete guide for dbt data transformation including models, tests, documentation, incremental builds, macros, packages, and production workflows

manutej
manutej
data-ai
open
data-engineering
29

manage-seeders

Manages Database Seeders with advanced support for JSON data sources, idempotency checks, and relationship mapping.

iurygdeoliveira
iurygdeoliveira
data-ai
open
data-engineering
26

tanstack-query-advanced

Advanced TanStack Query v5 patterns for infinite queries, optimistic updates, prefetching, gcTime, and queryOptions

yonatangross
yonatangross
data-ai
open
data-engineering
26

golden-dataset-management

Use when backing up, restoring, or validating golden datasets. Prevents data loss and ensures test data integrity for AI/ML evaluation systems.

yonatangross
yonatangross
data-ai
open
data-engineering
26

golden-dataset-validation

Use when validating golden dataset quality. Runs schema checks, duplicate detection, and coverage analysis to ensure dataset integrity for AI evaluation.

yonatangross
yonatangross
data-ai
open
data-engineering
26

gcs-data-catalog

Activates when querying Danish agricultural data from GCS. Use this skill for: data discovery, finding datasets, understanding schemas, querying parquet files, joining datasets on CVR/CHR/BFE identifiers. Keywords: data, catalog, datasets, GCS, parquet, schema, query, DuckDB, pyarrow

Klimabevaegelsen
Klimabevaegelsen
data-ai
open
data-engineering
25

streamlit-development

Developing, testing, and deploying Streamlit data applications on Snowflake. Use this skill when you're building interactive data apps, setting up local development environments, testing with pytest or Playwright, or deploying apps to Snowflake using Streamlit in Snowflake.

sfc-gh-dflippo
sfc-gh-dflippo
data-ai
open
data-engineering
25

dbt-core

Managing dbt-core locally - installation, configuration, project setup, package management, troubleshooting, and development workflow. Use this skill for all aspects of local dbt-core development including non-interactive scripts for environment setup with conda or venv, and comprehensive configuration templates for profiles.yml and dbt_project.yml.

sfc-gh-dflippo
sfc-gh-dflippo
data-ai
open
data-engineering
25

snowflake-connections

Configuring Snowflake connections using connections.toml (for Snowflake CLI, Streamlit, Snowpark) or profiles.yml (for dbt) with multiple authentication methods (SSO, key pair, username/password, OAuth), managing multiple environments, and overriding settings with environment variables. Use this skill when setting up Snowflake CLI, Streamlit apps, dbt, or any tool requiring Snowflake authentication and connection management.

sfc-gh-dflippo
sfc-gh-dflippo
data-ai
open
data-engineering
24

exploratory-data-analysis

EDA toolkit. Analyze CSV/Excel/JSON/Parquet files, statistical summaries, distributions, correlations, outliers, missing data, visualizations, markdown reports, for data profiling and insights.

lifangda
lifangda
data-ai
open
data-engineering
24

file-processing

Process and analyze CSV, JSON, and text files with data transformation, cleaning, analysis, and visualization capabilities

aws-samples
aws-samples
data-ai
open
data-engineering
24

polars

Fast DataFrame library (Apache Arrow). Select, filter, group_by, joins, lazy evaluation, CSV/Parquet I/O, expression API, for high-performance data analysis workflows.

lifangda
lifangda
data-ai
open
data-engineering
24

data-lake-platform

Universal data lake and lakehouse patterns covering ingestion (dlt, Airbyte), transformation (SQLMesh, dbt), storage formats (Iceberg, Delta, Hudi, Parquet), query engines (ClickHouse, DuckDB, Doris, StarRocks), streaming (Kafka, Flink), orchestration (Dagster, Airflow, Prefect), and visualization (Metabase, Superset, Grafana). Self-hosted and cloud options.

vasilyu1983
vasilyu1983
data-ai
open
data-engineering
24

data-engineer

Expert data engineer specializing in building scalable data pipelines, ETL/ELT processes, and data infrastructure. Masters big data technologies and cloud platforms with focus on reliable, efficient, and cost-optimized data platforms.

zenobi-us
zenobi-us
data-ai
open
data-engineering
24

ai-ml-data-science

End-to-end data science patterns (modern best practices): problem framing -> data -> EDA -> feature engineering (with feature stores) -> modelling -> evaluation -> reporting, plus SQL transformation (SQLMesh). Emphasizes MLOps integration, drift monitoring, and production-ready workflows.

vasilyu1983
vasilyu1983
data-ai
open
data-engineering
24

execplan

When writing complex features or significant refactors or user ask explicitly, use an ExecPlan from design to implementation.

tiann
tiann
data-ai
open
data-engineering
24

zarr-python

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

lifangda
lifangda
data-ai
open
data-engineering
23

managing-bd-tasks

Use for advanced bd operations - splitting tasks mid-flight, merging duplicates, changing dependencies, archiving epics, querying metrics, cross-epic dependencies

withzombies
withzombies
data-ai
open
Previous
Page 48 / 65
Next