data-assets
Create and modify Primary Data Assets with property management
Create and modify Primary Data Assets with property management
Create and modify Data Tables with row management
Use this skill when building real-time data pipelines, stream processing jobs, or change data capture systems. Triggers on tasks involving Apache Kafka (producers, consumers, topics, partitions, consumer groups, Connect, Streams), Apache Flink (DataStream API, windowing, checkpointing, stateful processing), event sourcing implementations, CDC with Debezium, stream processing patterns (windowing, watermarks, exactly-once semantics), and any pipeline that processes unbounded data in motion rather than data at rest.
Process barcoded spatial transcriptomics FASTQ pairs with st_pipeline, preserve upstream artifacts, convert the counts matrix into a standardized raw_counts.h5ad, and hand off cleanly to spatial-preprocess.
Use this skill when designing data warehouses, building star or snowflake schemas, implementing slowly changing dimensions (SCDs), writing analytical SQL for Snowflake or BigQuery, creating fact and dimension tables, or planning ETL/ELT pipelines for analytics. Triggers on dimensional modeling, surrogate keys, conformed dimensions, warehouse architecture, data vault, partitioning strategies, materialized views, and any task requiring OLAP schema design or warehouse query optimization.
Use this skill when implementing data validation, data quality monitoring, data lineage tracking, data contracts, or Great Expectations test suites. Triggers on schema validation, data profiling, freshness checks, row-count anomalies, column drift, expectation suites, contract testing between producers and consumers, lineage graphs, data observability, and any task requiring data integrity enforcement across pipelines.
Use this skill when building data pipelines, ETL/ELT workflows, or data transformation layers. Triggers on Airflow DAG design, dbt model creation, Spark job optimization, streaming vs batch architecture decisions, data ingestion, data quality checks, pipeline orchestration, incremental loads, CDC (change data capture), schema evolution, and data warehouse modeling. Acts as a senior data engineer advisor for building reliable, scalable data infrastructure.
Use this skill when building dbt models, designing semantic layers, defining metrics, creating self-serve analytics, or structuring a data warehouse for analyst consumption. Triggers on dbt project setup, model layering (staging, intermediate, marts), ref() and source() usage, YAML schema definitions, metrics definitions, semantic layer configuration, dimensional modeling, slowly changing dimensions, data testing, and any task requiring analytics engineering best practices.
将研究成果持久化到 Obsidian vault,维护论文池索引。支持每日研究日志、论文卡片、综述归档,以及跨项目论文去重和快速检索。
Use when the user wants to inspect the real live GTM runtime before schema generation or compare multiple live GTM containers.
Analyze CSV and tabular data, create summaries, and generate insights
Exploratory data analysis using ydata-profiling. Use when users upload .csv/.xlsx/.json/.parquet files or request "explore data", "analyze dataset", "EDA", "profile data". Generates interactive HTML or JSON reports with statistics, visualizations, correlations, and quality alerts.
Deploy agent to Databricks Apps using DAB (Databricks Asset Bundles). Use when: (1) User says 'deploy', 'push to databricks', or 'bundle deploy', (2) 'App already exists' error occurs, (3) Need to bind/unbind existing apps, (4) Debugging deployed apps, (5) Querying deployed app endpoints.
Enterprise Test Architecture (TEA) framework for quality engineering. Includes workflows for testing education (TEA Academy), risk-based test design, framework scaffolding, ATDD (Red-phase), CI/CD pipeline configuration, NFR (Non-functional) assessment, and quality auditing (0-100 scoring). Use for establishing or executing comprehensive testing strategies.
Structured content platform — GROQ queries, schema definitions, @sanity/client, Portable Text, image handling, real-time listeners, mutations, TypeGen
Pinecone serverless vector database -- index management, vector operations, metadata filtering, namespaces, hybrid search, inference API
Qdrant vector database -- collection management, point operations, payload filtering, named vectors, quantization, recommendations, snapshots
Firebase Data Connect integration for GraphQL-based data access with PostgreSQL. Use when building GraphQL schemas, queries, mutations, or integrating Firebase Data Connect with Angular applications. Supports type-safe generated SDKs, real-time subscriptions, and server-side data validation.
Use the Repomix MCP server to package codebase into consolidated files for AI analysis, search file contents, and understand project structure; essential for comprehensive codebase analysis and context gathering.
Use when launching OCI compute instances, troubleshooting out-of-capacity or boot failures, optimizing compute costs, or handling instance lifecycle. Covers shape selection, capacity planning, service limits, and production incident resolution.
Use when creating Autonomous Databases, troubleshooting connection failures, managing PDBs, or optimizing database costs. Covers connection string confusion, password validation errors, stop/start cost traps, clone type selection, and backup retention gotchas.
Deploy agent to Databricks Apps using DAB (Databricks Asset Bundles). Use when: (1) User says 'deploy', 'push to databricks', or 'bundle deploy', (2) 'App already exists' error occurs, (3) Need to bind/unbind existing apps, (4) Debugging deployed apps, (5) Querying deployed app endpoints.
Use when managing Oracle Autonomous Database on OCI, troubleshooting performance issues, optimizing costs, or implementing HA/DR. Covers ADB-specific gotchas, cost traps, SQL_ID debugging workflows, auto-scaling behavior, and version differences (19c/21c/23ai/26ai).