home/categories/data-ai

domain cluster

Data & AI

Machine learning, LLMs, and data processing.

9743 skillsall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-analysis

generate-report

Save investigation findings to a markdown report file. Use after completing triage, enrichment, or investigation to create a permanent record. Generates timestamped files in ./reports/ directory.

dandye

data-ai

open

llm-ai

Build Model Context Protocol (MCP) servers and tools that extend Claude's capabilities with custom functions, data sources, and integrations. Use when creating custom MCP servers, implementing tools for Claude, building integrations with external services, creating data source connectors, implementing custom functions, or extending Claude's capabilities with domain-specific tools.

korallis

data-ai

open

data-analysis

statistical-significance-annotation

Guide for annotating statistical significance (p-value asterisk notation) on comparison plots. Covers standard notation conventions (ns, *, **, ***, ****), when to annotate, matplotlib bracket+asterisk implementation, and integration with seaborn box/violin/bar plots. Use when generating publication-ready comparison figures that need significance markers to support statistical claims made in the analysis.

jaechang-hits

data-ai

open

data-analysis

dashboard

Comprehensive usage analytics and epistemic coverage dashboard across all sessions.

jongwony

data-ai

open

data-analysis

shap-model-explainability

Model interpretability using SHAP (SHapley Additive exPlanations) based on Shapley values from game theory. Covers explainer selection (Tree, Deep, Linear, Kernel, Gradient, Permutation), computing feature attributions, and visualization (waterfall, beeswarm, bar, scatter, force, heatmap). Use when explaining ML model predictions, computing feature importance, debugging model behavior, analyzing fairness/bias, or comparing models. Works with tree-based, deep learning, linear, and black-box models.

jaechang-hits

data-ai

open

data-analysis

statistical-analysis

Guided statistical analysis: test selection, assumption checking, effect sizes, power analysis, and APA reporting. Use when choosing appropriate tests for your data, verifying assumptions, calculating effect sizes, or formatting results for publication. Covers frequentist (t-test, ANOVA, chi-square, regression, correlation, survival, count models, agreement/reliability) and Bayesian alternatives. For implementing specific models use statsmodels or pymc-bayesian-modeling.

jaechang-hits

data-ai

open

data-analysis

matplotlib-scientific-plotting

Low-level Python plotting library for full customization of scientific figures. Use for publication-quality plots (line, scatter, bar, heatmap, contour, 3D), multi-panel subplot layouts, and fine-grained control over every visual element. Export to PNG/PDF/SVG. For quick statistical plots use seaborn; for interactive plots use plotly.

jaechang-hits

data-ai

open

data-analysis

nan-safe-correlation

Per-feature NaN-safe Spearman/Pearson correlation computation. Use when computing correlations across many features (genes, proteins, variants) with missing values. Covers why bulk matrix shortcuts fail with missing data, correct pairwise deletion, degenerate input filtering, and performance optimization for large datasets. For general statistical test selection use statistical-analysis; for model explainability use shap-model-explainability.

jaechang-hits

data-ai

open

data-analysis

plotly-interactive-plots

Interactive scientific visualization with Plotly. Two-layer API: plotly.express (px) for one-liner DataFrame plots and plotly.graph_objects (go) for full trace-level control. 40+ chart types with hover, zoom, pan, and animation. Exports to interactive HTML or static PNG/SVG/PDF via kaleido. Use for interactive web figures, volcano plots with gene hover info, dose-response dashboards, gene expression heatmaps, and 3D molecular visualizations. Use seaborn for statistical summaries with automatic aggregation; use matplotlib for fine-grained publication figures; use plotly for interactive or web-embedded output.

jaechang-hits

data-ai

open

data-analysis

plotly-interactive-visualization

Interactive visualization with Plotly. 40+ chart types (scatter, line, bar, heatmap, 3D, statistical, geographic) with hover, zoom, and pan. Use for exploratory analysis, dashboards, and presentations. Two APIs: Plotly Express (quick, DataFrame-oriented) and Graph Objects (fine-grained control). For static publication figures use matplotlib; for statistical grammar use seaborn.

jaechang-hits

data-ai

open

data-analysis

degenerate-input-filtering

Mandatory filtering of degenerate and uninformative data points before statistical tests. Covers single-sequence alignments, empty files, constant-value features, zero-variance inputs, and all-NaN columns. For NaN-aware correlation computation, see the nan-safe-correlation skill. For broader statistical testing guidance, see the statistical-analysis skill.

jaechang-hits

data-ai

open

data-analysis

hypothesis-generation

Structured hypothesis formulation from observations. Use when you have experimental observations or data and need to formulate testable hypotheses with predictions, propose mechanisms, and design experiments to test them. Follows scientific method framework. For open-ended ideation use scientific-brainstorming; for automated LLM-driven hypothesis testing on datasets use hypogenic.

jaechang-hits

data-ai

open

data-analysis

seaborn-statistical-plots

Statistical visualization library built on matplotlib with native pandas DataFrame support. Automatic aggregation, confidence intervals, and grouping for distribution plots (histplot, kdeplot), categorical comparisons (boxplot, violinplot, stripplot), relational plots (scatterplot, lineplot), regression plots (regplot, lmplot), matrix plots (heatmap, clustermap), and multi-variable grids (pairplot, jointplot, FacetGrid). Use seaborn for statistical summaries with minimal code; use matplotlib for fine-grained figure control; use plotly for interactive HTML output.

jaechang-hits

data-ai

open

data-analysis

seaborn-statistical-visualization

Statistical visualization built on matplotlib with pandas integration. Distribution plots (histplot, kdeplot, violinplot, boxplot), relational plots (scatterplot, lineplot), categorical comparisons, regression, correlation heatmaps. Automatic aggregation and CI. For interactive plots use plotly; for low-level control use matplotlib.

jaechang-hits

data-ai

open

data-analysis

networkx-graph-analysis

Graph and network analysis toolkit: create, manipulate, and analyze complex networks. Four graph types (directed, undirected, multi-edge), centrality measures, shortest paths, community detection, graph generators, I/O (GraphML, GML, edge list, pandas, NumPy), visualization with matplotlib. For large-scale graphs (100K+ nodes) use igraph or graph-tool; for graph neural networks use PyG.

jaechang-hits

data-ai

open

data-analysis

gwas-database

NHGRI-EBI GWAS Catalog REST API for SNP-trait associations from published genome-wide association studies. Query studies, associations, variants, traits, genes, and summary statistics. Build polygenic risk score candidates, analyze variant pleiotropy, download summary statistics for Manhattan plots. No authentication required.

jaechang-hits

data-ai

open

data-analysis

multiqc-qc-reports

Aggregates QC outputs from 150+ bioinformatics tools into a single interactive HTML report. Scans directories for FastQC, samtools, STAR, HISAT2, Trim Galore, featureCounts, Kallisto, Salmon, Picard, and GATK logs; merges statistics across samples with interactive plots. Essential for NGS pipeline QC review. Use FastQC directly instead for single-sample initial assessment; MultiQC is for multi-sample pipeline-wide reporting.

jaechang-hits

data-ai

open

data-analysis

matlab-scientific-computing

MATLAB/GNU Octave numerical computing for matrix operations, linear algebra, differential equations, signal processing, optimization, statistics, and scientific visualization. Code examples in MATLAB syntax (runs on both MATLAB and Octave). For Python-based scientific computing use numpy/scipy; for statistical modeling use statsmodels.

jaechang-hits

data-ai

open

data-analysis

sympy-symbolic-math

Symbolic mathematics in Python: exact algebra, calculus (derivatives, integrals, limits), equation solving, symbolic matrices, differential equations, code generation (lambdify, C/Fortran). Use when exact symbolic results are needed, not numerical approximations. For numerical computing use numpy/scipy; for statistical modeling use statsmodels.

jaechang-hits

data-ai

open

data-analysis

Conduct exploratory data analysis and statistical testing with test selection guidance. Use when exploring datasets, selecting statistical tests, performing power analysis, or preparing results for publication.

ChicagoHAI

data-ai

open

data-analysis

memory

Structured daily and weekly learning memory with dual graph snapshots.

MathClaw-ruc

data-ai

open

data-analysis

calculator

A simple calculator that can add, subtract, multiply, and divide numbers. Use when the user needs to perform basic arithmetic operations.

EXboys

data-ai

open

data-analysis

Analyze CSV/JSON data with statistics, filtering, and aggregation. Powered by pandas and numpy.

EXboys

data-ai

open

llm-ai

data-storytelling

Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.

aiskillstore

data-ai

open

Page 192 / 406