home/categories/data-ai

domain cluster

Data & AI

Machine learning, LLMs, and data processing.

9743 स्किल्सall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-engineering

synthetic-data-generation

Generate realistic synthetic data using Faker and Spark, with non-linear distributions, integrity constraints, and save to Databricks. Use when creating test data, demo datasets, or synthetic tables.

databricks-solutions

data-ai

open

data-analysis

survey-analyzer

Analyze survey responses with Likert scale analysis, cross-tabulations, sentiment scoring, and frequency distributions with visualizations.

dkyazzentwatwa

data-ai

open

data-analysis

territory-mapper

Use when asked to visualize sales territories, coverage areas, service regions, or geographic boundaries on interactive maps.

dkyazzentwatwa

data-ai

open

data-analysis

correlation-explorer

Find and visualize correlations between variables in datasets. Use for data exploration, feature selection, or identifying relationships between columns.

dkyazzentwatwa

data-ai

open

data-engineering

Displaying charts, dataframes, and metrics in Streamlit. Use when visualizing data, configuring dataframe columns, or adding sparklines to metrics. Covers native charts, Altair, and column configuration.

streamlit

data-ai

open

data-analysis

time-series-decomposer

Decompose time series into trend, seasonal, and residual components. Use for forecasting, pattern analysis, and seasonality detection.

dkyazzentwatwa

data-ai

open

data-engineering

ydata-eda-profiling

Generate and compare ydata-profiling EDA reports with sampling, consistent random seeds, and HTML outputs; often follows duckdb-parquet-lab-workflow when data is queried from Parquet.

crossxwill

data-ai

open

data-analysis

clustering-analyzer

Cluster data using K-Means, DBSCAN, hierarchical clustering. Use for customer segmentation, pattern discovery, or data grouping.

dkyazzentwatwa

data-ai

open

data-analysis

statistical-power-calculator

Use when asked to calculate statistical power, determine sample size, or plan experiments for hypothesis testing.

dkyazzentwatwa

data-ai

open

data-engineering

parent-segment

Manages CDP parent segments using `tdx ps` commands with YAML configs. Covers master tables, attributes, behaviors, `tdx ps validate` for join validation, `tdx ps preview` for data preview, and schedule configuration (daily/hourly/cron). Use when creating customer master tables, validating join match rates, or troubleshooting parent segment workflows.

treasure-data

data-ai

open

data-engineering

change-impact-analyzer

Analyzes impact of proposed changes on existing systems (brownfield projects) with delta spec validation. Trigger terms: change impact, impact analysis, brownfield, delta spec, change proposal, change management, existing system analysis, integration impact, breaking changes, dependency analysis, affected components, migration plan, risk assessment, brownfield change. Provides comprehensive change analysis for existing systems: - Affected component identification - Breaking change detection - Dependency graph updates - Integration point impact - Database migration analysis - API compatibility checks - Risk assessment and mitigation strategies - Migration plan recommendations Use when: proposing changes to existing systems, analyzing brownfield integration, or validating delta specifications.

nahisaho

data-ai

open

data-engineering

data-quality-auditor

Assess data quality with checks for missing values, duplicates, type issues, and inconsistencies. Use for data validation, ETL pipelines, or dataset documentation.

dkyazzentwatwa

data-ai

open

data-engineering

validate-segment

Validates CDP segment YAML configurations against the TD CDP API specification. Use when reviewing segment rules for correctness, checking operator types and values, or troubleshooting segment configuration errors before pushing to Treasure Data.

treasure-data

data-ai

open

data-engineering

kafka-stream-designer

Design Kafka topics, partitions, consumer groups, producers with idempotency, retry strategies, dead letter queues, exactly-once semantics, and schema registry integration

phatpham9

data-ai

open

data-engineering

asset-bundles

Create and configure Databricks Asset Bundles (DABs) with best practices for multi-environment deployments. Use when working with: (1) Creating new DAB projects, (2) Adding resources (dashboards, pipelines, jobs, alerts), (3) Configuring multi-environment deployments, (4) Setting up permissions, (5) Deploying or running bundle resources

databricks-solutions

data-ai

open

data-engineering

supabase-realtime

Comprehensive guide for implementing Supabase Realtime features with best practices, scalable patterns, and migration strategies. Use when building realtime features in Supabase applications including messaging, notifications, presence, live updates, collaborative features, or migrating from postgres_changes to broadcast. Covers client setup, database triggers with realtime.broadcast_changes, RLS authorization, naming conventions, and performance optimization.

Raudbjorn

data-ai

open

data-engineering

validate-journey

Validates CDP journey YAML configurations against tdx schema requirements. Use when reviewing journey structure, checking step types and parameters, verifying segment references, or troubleshooting journey configuration errors before pushing to Treasure Data.

treasure-data

data-ai

open

data-engineering

spark-declarative-pipelines

Creates, configures, and updates Databricks Lakeflow Spark Declarative Pipelines (SDP/LDP) using serverless compute. Handles streaming tables, materialized views, CDC, SCD Type 2, and Auto Loader ingestion patterns. Use when building data pipelines, working with Delta Live Tables, ingesting streaming data, implementing change data capture, or when the user mentions SDP, LDP, DLT, Lakeflow pipelines, streaming tables, or bronze/silver/gold medallion architectures.

databricks-solutions

data-ai

open

llm-ai

compaction-advisor

Provides context-aware compaction guidance with intelligent checkpointing. Monitors context during long tasks and suggests checkpoints before compaction interrupts your work.

vignesh07

data-ai

open

data-engineering

database-migration

ゼロダウンタイム戦略、データ変換、ロールバック手順を使用して、ORM間およびプラットフォーム間でデータベース移行を実行します。データベースの移行、スキーマの変更、データ変換の実行、またはゼロダウンタイムデプロイメント戦略の実装時に使用します。

amurata

data-ai

open

data-engineering

identity

Query identity change logs to explore profile creation and merging

treasure-data

data-ai

open

data-engineering

frappe-data-migration-generator

Generate data migration scripts for Frappe. Use when migrating data from legacy systems, transforming data structures, or importing large datasets.

Venkateshvenki404224

data-ai

open

data-engineering

databricks-python-sdk

Databricks development guidance including Python SDK, Databricks Connect, CLI, and REST API. Use when working with databricks-sdk, databricks-connect, or Databricks APIs.

databricks-solutions

data-ai

open

data-engineering

parquet-files

This describes how to create Parquet files in C#, including updating and multi threaded creation

lawless-m

data-ai

open

Page 251 / 406