home/categories/data-ai

domain cluster

Data & AI

Machine learning, LLMs, and data processing.

9743 skillsall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-engineering

770

deduplication

Event deduplication with canonical selection, reputation scoring, and hash-based grouping for multi-source data aggregation. Handles both ID-based and content-based deduplication.

dadbodgeoff

data-ai

open

data-engineering

770

fuzzy-matching

Multi-stage fuzzy matching pipeline for entity reconciliation. PostgreSQL trigram pre-filter, salient overlap check, and multi-factor similarity scoring.

dadbodgeoff

data-ai

open

data-engineering

770

intelligent-cache

Multi-layer caching with type-specific TTLs, get-or-generate pattern, memory and database layers, and graceful invalidation without cache stampede.

dadbodgeoff

data-ai

open

data-engineering

770

snapshot-aggregation

Daily compression of time-series data with merge logic for multiple pipeline runs, structured aggregation for dashboards, and storage estimation for capacity planning.

dadbodgeoff

data-ai

open

data-engineering

770

validation-quarantine

Data validation with quality scoring and quarantine for suspicious records. Validates incoming data without blocking the pipeline, enabling manual review of edge cases.

dadbodgeoff

data-ai

open

data-analysis

760

Covers publication-ready matplotlib figures for academic papers, slides, and reports—bars, trends, scatter, heatmaps, and multi-panel layouts—with this repository’s house style, print/vector export conventions, and parity with figures4papers demos. Use when the user is finalizing or creating such figures in matplotlib. Do not use for interactive dashboards or web viz (Plotly, Altair, Bokeh), exploratory-only plots without a publication target, dominant 3D or geographic mapping, or Illustrator/Figma-first infographic workflows.

ChenLiu-1996

data-ai

open

data-analysis

759

eqtl-catalogue-skill

Submit compact eQTL Catalogue API requests for association retrieval and documented metadata endpoints. Use when a user wants concise public eQTL Catalogue summaries

openai

data-ai

open

data-analysis

759

clinicaltrials-skill

Submit compact ClinicalTrials.gov API v2 requests for study search, metadata, enums, search areas, and field statistics. Use when a user wants concise ClinicalTrials.gov summaries

openai

data-ai

open

data-analysis

759

opentargets-skill

Submit compact Open Targets Platform GraphQL requests for target, disease, drug, variant, study, and search data, including associated-disease datasource heatmap matrices. Use when a user wants concise Open Targets summaries or per-datasource evidence context

openai

data-ai

open

data-analysis

759

tpmi-phewas-skill

Fetch compact TPMI PheWAS summaries for single variants by accepting rsID, GRCh37, or GRCh38 input and resolving to the required GRCh38 query. Use when a user wants concise TPMI association results for one variant

openai

data-ai

open

data-analysis

759

gwas-catalog-skill

Submit compact GWAS Catalog REST API v2 requests for studies, associations, SNPs, EFO traits, genes, publications, loci, and metadata. Use when a user wants concise GWAS Catalog summaries

openai

data-ai

open

data-analysis

759

hmdb-skill

Submit compact HMDB search requests for metabolites, proteins, diseases, and pathways. Use when a user wants concise HMDB summaries

openai

data-ai

open

data-analysis

759

mgnify-skill

Submit compact MGnify API requests for microbiome studies, samples, and biome metadata. Use when a user wants concise MGnify summaries

openai

data-ai

open

data-analysis

759

metabolights-skill

Submit compact MetaboLights requests for study discovery and study-level metabolomics metadata. Use when a user wants concise MetaboLights summaries

openai

data-ai

open

data-engineering

759

bgee-skill

Submit compact Bgee SPARQL requests for healthy wild-type expression metadata and ontology-aware lookup patterns. Use when a user wants concise Bgee summaries; save raw results only on request.

openai

data-ai

open

data-engineering

759

vercel-queues

Vercel Queues guidance (public beta) — durable event streaming with topics, consumer groups, retries, and delayed delivery. $0.60/1M ops. Powers Workflow DevKit. Use when building async processing, fan-out patterns, or event-driven architectures.

openai

data-ai

open

machine-learning

759

huggingface-community-evals

Run evaluations for Hugging Face Hub models using inspect-ai and lighteval on local hardware. Use for backend selection, local GPU evals, and choosing between vLLM / Transformers / accelerate. Not for HF Jobs orchestration, model-card PRs, .eval_results publication, or community-evals automation.

openai

data-ai

open

machine-learning

759

huggingface-llm-trainer

This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, and model persistence. Should be invoked for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.

openai

data-ai

open

machine-learning

759

next-forge

next-forge expert guidance — production-grade Turborepo monorepo SaaS starter by Vercel. Use when working in a next-forge project, scaffolding with `npx next-forge init`, or editing @repo/* workspace packages.

openai

data-ai

open

data-analysis

753

edge-signal-aggregator

Aggregate and rank signals from multiple edge-finding skills (edge-candidate-agent, theme-detector, sector-analyst, institutional-flow-tracker) into a prioritized conviction dashboard with weighted scoring, deduplication, and contradiction detection.

tradermonty

data-ai

open

data-analysis

753

downtrend-duration-analyzer

Analyze historical downtrend durations and generate interactive HTML histograms showing typical correction lengths by sector and market cap.

tradermonty

data-ai

open

data-engineering

753

edge-candidate-agent

Generate and prioritize US equity long-side edge research tickets from EOD observations, then export pipeline-ready candidate specs for trade-strategy-pipeline Phase I. Use when users ask to turn hypotheses/anomalies into reproducible research tickets, convert validated ideas into `strategy.yaml` + `metadata.json`, or preflight-check interface compatibility (`edge-finder-candidate/v1`) before running pipeline backtests.

tradermonty

data-ai

open

data-engineering

753

edge-pipeline-orchestrator

Orchestrate the full edge research pipeline from candidate detection through strategy design, review, revision, and export. Use when coordinating multi-stage edge research workflows end-to-end.

tradermonty

data-ai

open

data-engineering

753

edge-strategy-reviewer

Critically review strategy drafts from edge-strategy-designer for edge plausibility, overfitting risk, sample size adequacy, and execution realism. Use when strategy_drafts/*.yaml exists and needs quality gate before pipeline export. Outputs PASS/REVISE/REJECT verdicts with confidence scores.

tradermonty

data-ai

open

Page 101 / 406