domain cluster

Data & AI

Machine learning, LLMs, and data processing.

9743 스킬all categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
data-engineering
770

deduplication

Event deduplication with canonical selection, reputation scoring, and hash-based grouping for multi-source data aggregation. Handles both ID-based and content-based deduplication.

dadbodgeoff
dadbodgeoff
data-ai
open
data-engineering
770

fuzzy-matching

Multi-stage fuzzy matching pipeline for entity reconciliation. PostgreSQL trigram pre-filter, salient overlap check, and multi-factor similarity scoring.

dadbodgeoff
dadbodgeoff
data-ai
open
data-engineering
770

intelligent-cache

Multi-layer caching with type-specific TTLs, get-or-generate pattern, memory and database layers, and graceful invalidation without cache stampede.

dadbodgeoff
dadbodgeoff
data-ai
open
data-engineering
770

snapshot-aggregation

Daily compression of time-series data with merge logic for multiple pipeline runs, structured aggregation for dashboards, and storage estimation for capacity planning.

dadbodgeoff
dadbodgeoff
data-ai
open
data-engineering
770

validation-quarantine

Data validation with quality scoring and quarantine for suspicious records. Validates incoming data without blocking the pipeline, enabling manual review of edge cases.

dadbodgeoff
dadbodgeoff
data-ai
open
data-analysis
760

scientific-figure-making

Covers publication-ready matplotlib figures for academic papers, slides, and reports—bars, trends, scatter, heatmaps, and multi-panel layouts—with this repository’s house style, print/vector export conventions, and parity with figures4papers demos. Use when the user is finalizing or creating such figures in matplotlib. Do not use for interactive dashboards or web viz (Plotly, Altair, Bokeh), exploratory-only plots without a publication target, dominant 3D or geographic mapping, or Illustrator/Figma-first infographic workflows.

ChenLiu-1996
ChenLiu-1996
data-ai
open
data-analysis
759

eqtl-catalogue-skill

Submit compact eQTL Catalogue API requests for association retrieval and documented metadata endpoints. Use when a user wants concise public eQTL Catalogue summaries

openai
openai
data-ai
open
data-analysis
759

clinicaltrials-skill

Submit compact ClinicalTrials.gov API v2 requests for study search, metadata, enums, search areas, and field statistics. Use when a user wants concise ClinicalTrials.gov summaries

openai
openai
data-ai
open
data-analysis
759

opentargets-skill

Submit compact Open Targets Platform GraphQL requests for target, disease, drug, variant, study, and search data, including associated-disease datasource heatmap matrices. Use when a user wants concise Open Targets summaries or per-datasource evidence context

openai
openai
data-ai
open
data-analysis
759

tpmi-phewas-skill

Fetch compact TPMI PheWAS summaries for single variants by accepting rsID, GRCh37, or GRCh38 input and resolving to the required GRCh38 query. Use when a user wants concise TPMI association results for one variant

openai
openai
data-ai
open
data-analysis
759

gwas-catalog-skill

Submit compact GWAS Catalog REST API v2 requests for studies, associations, SNPs, EFO traits, genes, publications, loci, and metadata. Use when a user wants concise GWAS Catalog summaries

openai
openai
data-ai
open
data-analysis
759

hmdb-skill

Submit compact HMDB search requests for metabolites, proteins, diseases, and pathways. Use when a user wants concise HMDB summaries

openai
openai
data-ai
open
data-analysis
759

mgnify-skill

Submit compact MGnify API requests for microbiome studies, samples, and biome metadata. Use when a user wants concise MGnify summaries

openai
openai
data-ai
open
data-analysis
759

metabolights-skill

Submit compact MetaboLights requests for study discovery and study-level metabolomics metadata. Use when a user wants concise MetaboLights summaries

openai
openai
data-ai
open
data-engineering
759

bgee-skill

Submit compact Bgee SPARQL requests for healthy wild-type expression metadata and ontology-aware lookup patterns. Use when a user wants concise Bgee summaries; save raw results only on request.

openai
openai
data-ai
open
data-engineering
759

vercel-queues

Vercel Queues guidance (public beta) — durable event streaming with topics, consumer groups, retries, and delayed delivery. $0.60/1M ops. Powers Workflow DevKit. Use when building async processing, fan-out patterns, or event-driven architectures.

openai
openai
data-ai
open
machine-learning
759

huggingface-community-evals

Run evaluations for Hugging Face Hub models using inspect-ai and lighteval on local hardware. Use for backend selection, local GPU evals, and choosing between vLLM / Transformers / accelerate. Not for HF Jobs orchestration, model-card PRs, .eval_results publication, or community-evals automation.

openai
openai
data-ai
open
machine-learning
759

huggingface-llm-trainer

This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, and model persistence. Should be invoked for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.

openai
openai
data-ai
open
machine-learning
759

next-forge

next-forge expert guidance — production-grade Turborepo monorepo SaaS starter by Vercel. Use when working in a next-forge project, scaffolding with `npx next-forge init`, or editing @repo/* workspace packages.

openai
openai
data-ai
open
data-analysis
753

edge-signal-aggregator

Aggregate and rank signals from multiple edge-finding skills (edge-candidate-agent, theme-detector, sector-analyst, institutional-flow-tracker) into a prioritized conviction dashboard with weighted scoring, deduplication, and contradiction detection.

tradermonty
tradermonty
data-ai
open
data-analysis
753

downtrend-duration-analyzer

Analyze historical downtrend durations and generate interactive HTML histograms showing typical correction lengths by sector and market cap.

tradermonty
tradermonty
data-ai
open
data-engineering
753

edge-candidate-agent

Generate and prioritize US equity long-side edge research tickets from EOD observations, then export pipeline-ready candidate specs for trade-strategy-pipeline Phase I. Use when users ask to turn hypotheses/anomalies into reproducible research tickets, convert validated ideas into `strategy.yaml` + `metadata.json`, or preflight-check interface compatibility (`edge-finder-candidate/v1`) before running pipeline backtests.

tradermonty
tradermonty
data-ai
open
data-engineering
753

edge-pipeline-orchestrator

Orchestrate the full edge research pipeline from candidate detection through strategy design, review, revision, and export. Use when coordinating multi-stage edge research workflows end-to-end.

tradermonty
tradermonty
data-ai
open
data-engineering
753

edge-strategy-reviewer

Critically review strategy drafts from edge-strategy-designer for edge plausibility, overfitting risk, sample size adequacy, and execution realism. Use when strategy_drafts/*.yaml exists and needs quality gate before pipeline export. Outputs PASS/REVISE/REJECT verdicts with confidence scores.

tradermonty
tradermonty
data-ai
open
Previous
Page 101 / 406
Next