domain cluster

Data & AI

Machine learning, LLMs, and data processing.

9743 스킬all categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
data-engineering
5

synthetic-data-generation

Generate realistic synthetic data using Faker and Spark, with non-linear distributions, integrity constraints, and save to Databricks. Use when creating test data, demo datasets, or synthetic tables.

databricks-solutions
databricks-solutions
data-ai
open
data-analysis
5

survey-analyzer

Analyze survey responses with Likert scale analysis, cross-tabulations, sentiment scoring, and frequency distributions with visualizations.

dkyazzentwatwa
dkyazzentwatwa
data-ai
open
data-analysis
5

territory-mapper

Use when asked to visualize sales territories, coverage areas, service regions, or geographic boundaries on interactive maps.

dkyazzentwatwa
dkyazzentwatwa
data-ai
open
data-analysis
5

correlation-explorer

Find and visualize correlations between variables in datasets. Use for data exploration, feature selection, or identifying relationships between columns.

dkyazzentwatwa
dkyazzentwatwa
data-ai
open
data-engineering
5

displaying-streamlit-data

Displaying charts, dataframes, and metrics in Streamlit. Use when visualizing data, configuring dataframe columns, or adding sparklines to metrics. Covers native charts, Altair, and column configuration.

streamlit
streamlit
data-ai
open
data-analysis
5

time-series-decomposer

Decompose time series into trend, seasonal, and residual components. Use for forecasting, pattern analysis, and seasonality detection.

dkyazzentwatwa
dkyazzentwatwa
data-ai
open
data-engineering
5

ydata-eda-profiling

Generate and compare ydata-profiling EDA reports with sampling, consistent random seeds, and HTML outputs; often follows duckdb-parquet-lab-workflow when data is queried from Parquet.

crossxwill
crossxwill
data-ai
open
data-analysis
5

clustering-analyzer

Cluster data using K-Means, DBSCAN, hierarchical clustering. Use for customer segmentation, pattern discovery, or data grouping.

dkyazzentwatwa
dkyazzentwatwa
data-ai
open
data-analysis
5

statistical-power-calculator

Use when asked to calculate statistical power, determine sample size, or plan experiments for hypothesis testing.

dkyazzentwatwa
dkyazzentwatwa
data-ai
open
data-engineering
5

parent-segment

Manages CDP parent segments using `tdx ps` commands with YAML configs. Covers master tables, attributes, behaviors, `tdx ps validate` for join validation, `tdx ps preview` for data preview, and schedule configuration (daily/hourly/cron). Use when creating customer master tables, validating join match rates, or troubleshooting parent segment workflows.

treasure-data
treasure-data
data-ai
open
data-engineering
5

change-impact-analyzer

Analyzes impact of proposed changes on existing systems (brownfield projects) with delta spec validation. Trigger terms: change impact, impact analysis, brownfield, delta spec, change proposal, change management, existing system analysis, integration impact, breaking changes, dependency analysis, affected components, migration plan, risk assessment, brownfield change. Provides comprehensive change analysis for existing systems: - Affected component identification - Breaking change detection - Dependency graph updates - Integration point impact - Database migration analysis - API compatibility checks - Risk assessment and mitigation strategies - Migration plan recommendations Use when: proposing changes to existing systems, analyzing brownfield integration, or validating delta specifications.

nahisaho
nahisaho
data-ai
open
data-engineering
5

data-quality-auditor

Assess data quality with checks for missing values, duplicates, type issues, and inconsistencies. Use for data validation, ETL pipelines, or dataset documentation.

dkyazzentwatwa
dkyazzentwatwa
data-ai
open
data-engineering
5

validate-segment

Validates CDP segment YAML configurations against the TD CDP API specification. Use when reviewing segment rules for correctness, checking operator types and values, or troubleshooting segment configuration errors before pushing to Treasure Data.

treasure-data
treasure-data
data-ai
open
data-engineering
5

kafka-stream-designer

Design Kafka topics, partitions, consumer groups, producers with idempotency, retry strategies, dead letter queues, exactly-once semantics, and schema registry integration

phatpham9
phatpham9
data-ai
open
data-engineering
5

asset-bundles

Create and configure Databricks Asset Bundles (DABs) with best practices for multi-environment deployments. Use when working with: (1) Creating new DAB projects, (2) Adding resources (dashboards, pipelines, jobs, alerts), (3) Configuring multi-environment deployments, (4) Setting up permissions, (5) Deploying or running bundle resources

databricks-solutions
databricks-solutions
data-ai
open
data-engineering
5

supabase-realtime

Comprehensive guide for implementing Supabase Realtime features with best practices, scalable patterns, and migration strategies. Use when building realtime features in Supabase applications including messaging, notifications, presence, live updates, collaborative features, or migrating from postgres_changes to broadcast. Covers client setup, database triggers with realtime.broadcast_changes, RLS authorization, naming conventions, and performance optimization.

Raudbjorn
Raudbjorn
data-ai
open
data-engineering
5

validate-journey

Validates CDP journey YAML configurations against tdx schema requirements. Use when reviewing journey structure, checking step types and parameters, verifying segment references, or troubleshooting journey configuration errors before pushing to Treasure Data.

treasure-data
treasure-data
data-ai
open
data-engineering
5

spark-declarative-pipelines

Creates, configures, and updates Databricks Lakeflow Spark Declarative Pipelines (SDP/LDP) using serverless compute. Handles streaming tables, materialized views, CDC, SCD Type 2, and Auto Loader ingestion patterns. Use when building data pipelines, working with Delta Live Tables, ingesting streaming data, implementing change data capture, or when the user mentions SDP, LDP, DLT, Lakeflow pipelines, streaming tables, or bronze/silver/gold medallion architectures.

databricks-solutions
databricks-solutions
data-ai
open
llm-ai
5

compaction-advisor

Provides context-aware compaction guidance with intelligent checkpointing. Monitors context during long tasks and suggests checkpoints before compaction interrupts your work.

vignesh07
vignesh07
data-ai
open
data-engineering
5

database-migration

ゼロダウンタイム戦略、データ変換、ロールバック手順を使用して、ORM間およびプラットフォーム間でデータベース移行を実行します。データベースの移行、スキーマの変更、データ変換の実行、またはゼロダウンタイムデプロイメント戦略の実装時に使用します。

amurata
amurata
data-ai
open
data-engineering
5

identity

Query identity change logs to explore profile creation and merging

treasure-data
treasure-data
data-ai
open
data-engineering
5

frappe-data-migration-generator

Generate data migration scripts for Frappe. Use when migrating data from legacy systems, transforming data structures, or importing large datasets.

Venkateshvenki404224
Venkateshvenki404224
data-ai
open
data-engineering
5

databricks-python-sdk

Databricks development guidance including Python SDK, Databricks Connect, CLI, and REST API. Use when working with databricks-sdk, databricks-connect, or Databricks APIs.

databricks-solutions
databricks-solutions
data-ai
open
data-engineering
5

parquet-files

This describes how to create Parquet files in C#, including updating and multi threaded creation

lawless-m
lawless-m
data-ai
open
Previous
Page 251 / 406
Next