home/categories/data-ai

domain cluster

Data & AI

Machine learning, LLMs, and data processing.

9743 스킬all categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-analysis

156

investigation-plotter

Generate investigation-specific figures from the local artifact graph (post-run plotting agent).

lamm-mit

data-ai

open

data-analysis

156

fabric

Pattern-based analysis using Fabric's 242+ specialized prompts for summarizing papers and extracting insights

lamm-mit

data-ai

open

data-analysis

156

polygenic-risk-score

ToolUniverse workflow — Polygenic Risk Score

lamm-mit

data-ai

open

data-analysis

156

Low-level plotting library for full customization. Use when you need fine-grained control over every plot element, creating novel plot types, or integrating with specific scientific workflows. Export to PNG/PDF/SVG for publication. For quick statistical plots use seaborn; for interactive plots use plotly; for publication-ready multi-panel figures with journal styling, use scientific-visualization.

lamm-mit

data-ai

open

data-analysis

156

minerals-viz

Generate charts (PNG/SVG) for critical minerals data — production, trade, import reliance, and time series

lamm-mit

data-ai

open

data-analysis

156

matlab

MATLAB and GNU Octave numerical computing for matrix operations, data analysis, visualization, and scientific computing. Use when writing MATLAB/Octave scripts for linear algebra, signal processing, image processing, differential equations, optimization, statistics, or creating scientific visualizations. Also use when the user needs help with MATLAB syntax, functions, or wants to convert between MATLAB and Python code. Scripts can be executed with MATLAB or the open-source GNU Octave interpreter.

lamm-mit

data-ai

open

data-analysis

156

plotly

Interactive visualization library. Use when you need hover info, zoom, pan, or web-embeddable charts. Best for dashboards, exploratory analysis, and presentations. For static publication figures use matplotlib or scientific-visualization.

lamm-mit

data-ai

open

data-engineering

156

dask

Distributed computing for larger-than-RAM pandas/NumPy workflows. Use when you need to scale existing pandas/NumPy code beyond memory or across clusters. Best for parallel file processing, distributed ML, integration with existing pandas code. For out-of-core analytics on single machine use vaex; for in-memory speed use polars.

lamm-mit

data-ai

open

data-engineering

156

dnanexus-integration

DNAnexus cloud genomics platform. Build apps/applets, manage data (upload/download), dxpy Python SDK, run workflows, FASTQ/BAM/VCF, for genomics pipeline development and execution.

lamm-mit

data-ai

open

data-engineering

156

drug-repurposing

ToolUniverse workflow — Drug Repurposing

lamm-mit

data-ai

open

data-engineering

156

drug-target-validation

ToolUniverse workflow — Drug Target Validation

lamm-mit

data-ai

open

data-engineering

156

lamindb

This skill should be used when working with LaminDB, an open-source data framework for biology that makes data queryable, traceable, reproducible, and FAIR. Use when managing biological datasets (scRNA-seq, spatial, flow cytometry, etc.), tracking computational workflows, curating and validating data with biological ontologies, building data lakehouses, or ensuring data lineage and reproducibility in biological research. Covers data management, annotation, ontologies (genes, cell types, diseases, tissues), schema validation, integrations with workflow managers (Nextflow, Snakemake) and MLOps platforms (W&B, MLflow), and deployment strategies.

lamm-mit

data-ai

open

data-engineering

156

opentargets-database

Query Open Targets Platform for target-disease associations, drug target discovery, tractability/safety data, genetics/omics evidence, known drugs, for therapeutic target identification.

lamm-mit

data-ai

open

data-engineering

156

polars

Fast in-memory DataFrame library for datasets that fit in RAM. Use when pandas is too slow but data still fits in memory. Lazy evaluation, parallel execution, Apache Arrow backend. Best for 1-100GB datasets, ETL pipelines, faster pandas replacement. For larger-than-RAM data use dask or vaex.

lamm-mit

data-ai

open

data-engineering

156

rnaseq-deseq2

ToolUniverse workflow — Rnaseq Deseq2

lamm-mit

data-ai

open

data-engineering

156

vaex

Use this skill for processing and analyzing large tabular datasets (billions of rows) that exceed available RAM. Vaex excels at out-of-core DataFrame operations, lazy evaluation, fast aggregations, efficient visualization of big data, and machine learning on large datasets. Apply when users need to work with large CSV/HDF5/Arrow/Parquet files, perform fast statistics on massive datasets, create visualizations of big data, or build ML pipelines that do not fit in memory.

lamm-mit

data-ai

open

data-engineering

156

zarr-python

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

lamm-mit

data-ai

open

data-analysis

155

yida-report

宜搭原生报表技能，用于创建宜搭平台内置的原生报表页面（vc-yida-report 组件库），支持 16 种开箱即用的图表/表格/筛选器组件，通过 openyida create-report 命令生成报表 Schema 并发布。本技能定位：创建宜搭原生报表（作为数据源），普通的「报表」「统计」需求默认使用本技能。如需更美观的 ECharts 自定义可视化大屏，请使用 yida-chart 技能（依赖本技能创建的原生报表作为数据源）。不适用于：创建 ECharts 自定义可视化大屏（应使用 yida-chart），或直接查询表单数据（应使用 yida-data-management）。

openyida

data-ai

open

data-analysis

155

pm-data-discovery

用于先搞清楚“有哪些数据可看、怎么搜索目标市场、怎么落地数据样本”的技能。只做数据发现与取数，不做策略假设。

YichengYang-Ethan

data-ai

open

data-engineering

155

digikey

Search DigiKey for electronic components and download datasheets — primary source for prototype orders and the preferred API method for fetching datasheets. Find parts by keyword or MPN, check pricing/stock, download datasheets via API, analyze specifications. Sync and maintain a local datasheets directory — extract components from schematics, download missing datasheets, keep them up to date. Use when the user asks about electronic components, part specs, datasheets, pricing, stock, footprints, or needs to download a datasheet — even without mentioning "DigiKey". Also for "sync datasheets", "download datasheets for my board/project", or mentions a datasheets directory. DigiKey is the default distributor for prototyping. For BOM workflows, see the bom skill.

aklofas

data-ai

open

data-engineering

155

task-decomposer

Produces structured phased task boards from feature requests: dependency-mapped work items with parallelization flags, risk flags, edge case tables, and test strategy matrices. Triggers on: "decompose this feature", "task breakdown with dependencies", "phased implementation plan", "dependency map for", "break this into tasks with phases", "work breakdown structure". The differentiator is the structured output format (phased tables, parallelization flags, dependency chains) — use this skill when you need a formal task board, not ad-hoc decomposition the model handles natively. NOT for effort estimates, PERT calculations, or confidence intervals — use estimate-calibrator instead.

Mathews-Tom

data-ai

open

llm-ai

154

memory-contract

Unified Memory Contract for Flowbaby integration. Defines when and how to retrieve and store memory. Load at session start - memory is core to agent reasoning, not optional.