synthetic-data-generation
Generate realistic synthetic data using Faker and Spark, with non-linear distributions, integrity constraints, and save to Databricks. Use when creating test data, demo datasets, or synthetic tables.
Generate realistic synthetic data using Faker and Spark, with non-linear distributions, integrity constraints, and save to Databricks. Use when creating test data, demo datasets, or synthetic tables.
Analyze survey responses with Likert scale analysis, cross-tabulations, sentiment scoring, and frequency distributions with visualizations.
Use when asked to visualize sales territories, coverage areas, service regions, or geographic boundaries on interactive maps.
Find and visualize correlations between variables in datasets. Use for data exploration, feature selection, or identifying relationships between columns.
Displaying charts, dataframes, and metrics in Streamlit. Use when visualizing data, configuring dataframe columns, or adding sparklines to metrics. Covers native charts, Altair, and column configuration.
Decompose time series into trend, seasonal, and residual components. Use for forecasting, pattern analysis, and seasonality detection.
Generate and compare ydata-profiling EDA reports with sampling, consistent random seeds, and HTML outputs; often follows duckdb-parquet-lab-workflow when data is queried from Parquet.
Cluster data using K-Means, DBSCAN, hierarchical clustering. Use for customer segmentation, pattern discovery, or data grouping.
Use when asked to calculate statistical power, determine sample size, or plan experiments for hypothesis testing.
Manages CDP parent segments using `tdx ps` commands with YAML configs. Covers master tables, attributes, behaviors, `tdx ps validate` for join validation, `tdx ps preview` for data preview, and schedule configuration (daily/hourly/cron). Use when creating customer master tables, validating join match rates, or troubleshooting parent segment workflows.
Analyzes impact of proposed changes on existing systems (brownfield projects) with delta spec validation. Trigger terms: change impact, impact analysis, brownfield, delta spec, change proposal, change management, existing system analysis, integration impact, breaking changes, dependency analysis, affected components, migration plan, risk assessment, brownfield change. Provides comprehensive change analysis for existing systems: - Affected component identification - Breaking change detection - Dependency graph updates - Integration point impact - Database migration analysis - API compatibility checks - Risk assessment and mitigation strategies - Migration plan recommendations Use when: proposing changes to existing systems, analyzing brownfield integration, or validating delta specifications.
Assess data quality with checks for missing values, duplicates, type issues, and inconsistencies. Use for data validation, ETL pipelines, or dataset documentation.
Validates CDP segment YAML configurations against the TD CDP API specification. Use when reviewing segment rules for correctness, checking operator types and values, or troubleshooting segment configuration errors before pushing to Treasure Data.
Design Kafka topics, partitions, consumer groups, producers with idempotency, retry strategies, dead letter queues, exactly-once semantics, and schema registry integration
Create and configure Databricks Asset Bundles (DABs) with best practices for multi-environment deployments. Use when working with: (1) Creating new DAB projects, (2) Adding resources (dashboards, pipelines, jobs, alerts), (3) Configuring multi-environment deployments, (4) Setting up permissions, (5) Deploying or running bundle resources
Comprehensive guide for implementing Supabase Realtime features with best practices, scalable patterns, and migration strategies. Use when building realtime features in Supabase applications including messaging, notifications, presence, live updates, collaborative features, or migrating from postgres_changes to broadcast. Covers client setup, database triggers with realtime.broadcast_changes, RLS authorization, naming conventions, and performance optimization.
Validates CDP journey YAML configurations against tdx schema requirements. Use when reviewing journey structure, checking step types and parameters, verifying segment references, or troubleshooting journey configuration errors before pushing to Treasure Data.
Creates, configures, and updates Databricks Lakeflow Spark Declarative Pipelines (SDP/LDP) using serverless compute. Handles streaming tables, materialized views, CDC, SCD Type 2, and Auto Loader ingestion patterns. Use when building data pipelines, working with Delta Live Tables, ingesting streaming data, implementing change data capture, or when the user mentions SDP, LDP, DLT, Lakeflow pipelines, streaming tables, or bronze/silver/gold medallion architectures.
Provides context-aware compaction guidance with intelligent checkpointing. Monitors context during long tasks and suggests checkpoints before compaction interrupts your work.
ゼロダウンタイム戦略、データ変換、ロールバック手順を使用して、ORM間およびプラットフォーム間でデータベース移行を実行します。データベースの移行、スキーマの変更、データ変換の実行、またはゼロダウンタイムデプロイメント戦略の実装時に使用します。
Generate data migration scripts for Frappe. Use when migrating data from legacy systems, transforming data structures, or importing large datasets.
Databricks development guidance including Python SDK, Databricks Connect, CLI, and REST API. Use when working with databricks-sdk, databricks-connect, or Databricks APIs.
This describes how to create Parquet files in C#, including updating and multi threaded creation