mova-skill-ingest-store-episode-basic-wrapper
Persists ds.episode_skill_ingest_run_v1 in the lab’s genetic file store.
Persists ds.episode_skill_ingest_run_v1 in the lab’s genetic file store.
Apply the Agent OS standard for backend queries.
Use this skill when implementing data access, repositories, or query logic with Entity Framework Core.
Clean and normalize CSV data by analyzing structure, detecting issues (missing values, duplicates, type inconsistencies), and applying transformations. Use when users need to prepare messy CSV files for analysis or import.
Refactor Pandas code to improve maintainability, readability, and performance. Identifies and fixes loops/.iterrows() that should be vectorized, overuse of .apply() where vectorized alternatives exist, chained indexing patterns, inplace=True usage, inefficient dtypes, missing method chaining opportunities, complex filters, merge operations without validation, and SettingWithCopyWarning patterns. Applies Pandas 2.0+ features including PyArrow backend, Copy-on-Write, vectorized operations, method chaining, .query()/.eval(), optimized dtypes, and pipeline patterns.
Fix missing schema definitions in Output SDK steps. Use when seeing type errors, undefined properties at step boundaries, validation failures, or when step inputs/outputs aren't being properly typed.
Implement or refactor data import/parsing so it streams files sequentially (memory-safe), validates/coerces types explicitly, skips irreparable records while logging them to an error CSV (full original columns + timestamp/file/line/error), emits rows_ok/rows_skipped/parse_errors metrics, and guarantees idempotent DB writes.
Maintain database quality, ensure data alignment, verify mission compliance, prepare for JusticeHub syndication.
Python dataclass best practices: slots, frozen, validation. Trigger when optimizing dataclasses or creating config classes.
Use when performing JSON transformations, manipulating nested JSON structures, filtering arrays, extracting values, validating JSON schemas, or composing multi-step JSON operations. This skill provides 60+ schema-driven jq transformations optimized for AI agent discoverability.
Assess data quality for the normalised log and recommend remediation thresholds and privacy handling.
Phase 4 of Ontology Builder Pipeline. Generates final Ontology artifacts (entity definitions, workflow catalog, concept guides) from DRD. Use after Phase 3 DRD is complete.
Audits Decision JSONL logs for schema compliance, required metadata, and invariants across recorded decisions.
Apache Airflow DAG development with TaskFlow API, Google Cloud operators (BigQuery, GCS), dbt integration, and dynamic DAG generation. Use when creating or modifying Airflow DAGs, implementing data pipeline orchestration, setting up cross-DAG dependencies with ExternalTaskSensor, adding deferrable operators, or configuring error handling and retries.
テスト用の合成データ生成スキル。リアルなユーザーデータ、トランザクション、ログ、APIレスポンス等を生成。GDPR準拠、フェイカー連携、スキーマベース生成、時系列データ、異常データ生成に対応。
Connect your own data source to replace the demo unicorns data. Use when the user wants to use their own database URL or CSV file instead of the sample data. Triggers on requests to connect database, import CSV, change data source, use own data, or switch from demo data.
Convert a single Nextflow process to a Galaxy tool XML
Analyze ClickHouse system log table health including TTL configuration, disk usage, freshness, and cleanup. Use for system log issues and TTL configuration.
Load when working on data pipelines, datasets, reproducibility, or data infrastructure topics. Contains best practices for data engineering, ETL/ELT patterns, and ensuring reproducible data workflows.
ユーザーの自然言語指示を受け取り、Snowflakeで実行するSQLクエリを生成・実行し、結果をCSV形式で出力します。「Snowflakeで〜を取得して」「〜のデータを表示して」などのリクエストで自動的に起動します。
FastAPI with PostgreSQL, async SQLAlchemy 2.0, Alembic, and Docker.
Build scalable data pipelines, ETL/ELT processes, and data infrastructure. Use when: (1) designing data architectures or lakehouse patterns, (2) building Spark/Kafka/Flink/Beam pipelines, (3) optimizing Snowflake/BigQuery/Redshift queries, (4) implementing Airflow/Prefect/Dagster orchestration, (5) setting up data quality frameworks, (6) cost-optimizing data platforms.
Phase 1 of Ontology Builder Pipeline. Ingests and catalogs all input materials from _input/ folder. Use when starting ontology building process or when processing new input documents for domain analysis.