domain cluster

Data & AI

Machine learning, LLMs, and data processing.

9743 skillsall categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
machine-learning
950

scikit-learn

Machine learning in Python with scikit-learn. Use when working with supervised learning (classification, regression), unsupervised learning (clustering, dimensionality reduction), model evaluation, hyperparameter tuning, preprocessing, or building ML pipelines. Provides comprehensive reference documentation for algorithms, preprocessing techniques, pipelines, and best practices.

wu-yc
wu-yc
data-ai
open
machine-learning
950

scikit-survival

Comprehensive toolkit for survival analysis and time-to-event modeling in Python using scikit-survival. Use this skill when working with censored survival data, performing time-to-event analysis, fitting Cox models, Random Survival Forests, Gradient Boosting models, or Survival SVMs, evaluating survival predictions with concordance index or Brier score, handling competing risks, or implementing any survival analysis workflow with the scikit-survival library.

wu-yc
wu-yc
data-ai
open
machine-learning
950

shap

Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model.

wu-yc
wu-yc
data-ai
open
machine-learning
950

transformers

This skill should be used when working with pre-trained transformer models for natural language processing, computer vision, audio, or multimodal tasks. Use for text generation, classification, question answering, translation, summarization, image classification, object detection, speech recognition, and fine-tuning models on custom datasets.

wu-yc
wu-yc
data-ai
open
machine-learning
950

pyhealth

Comprehensive healthcare AI toolkit for developing, testing, and deploying machine learning models with clinical data. This skill should be used when working with electronic health records (EHR), clinical prediction tasks (mortality, readmission, drug recommendation), medical coding systems (ICD, NDC, ATC), physiological signals (EEG, ECG), healthcare datasets (MIMIC-III/IV, eICU, OMOP), or implementing deep learning models for healthcare applications (RETAIN, SafeDrug, Transformer, GNN).

wu-yc
wu-yc
data-ai
open
data-analysis
946

powerlifting

Calculating powerlifting scores to determine the performance of lifters across different weight classes.

benchflow-ai
benchflow-ai
data-ai
open
data-analysis
946

search-driving-distance

Estimate driving/taxi duration, distance, and rough cost between two cities using the bundled distance matrix CSV. Use this skill when comparing ground travel options or validating itinerary legs.

benchflow-ai
benchflow-ai
data-ai
open
data-engineering
946

jax-skills

High-performance numerical computing and machine learning workflows using JAX. Supports array operations, automatic differentiation, JIT compilation, RNN-style scans, map/reduce operations, and gradient computations. Ideal for scientific computing, ML models, and dynamic array transformations.

benchflow-ai
benchflow-ai
data-ai
open
machine-learning
946

mhc-algorithm

Implement mHC (Manifold-Constrained Hyper-Connections) for stabilizing deep network training. Use when implementing residual connection improvements with doubly stochastic matrices via Sinkhorn-Knopp algorithm. Based on DeepSeek's 2025 paper (arXiv:2512.24880).

benchflow-ai
benchflow-ai
data-ai
open
machine-learning
946

first-order-model-fitting

Fit first-order dynamic models to experimental step response data and extract K (gain) and tau (time constant) parameters.

benchflow-ai
benchflow-ai
data-ai
open
machine-learning
946

scipy-curve-fit

Use scipy.optimize.curve_fit for nonlinear least squares parameter estimation from experimental data.

benchflow-ai
benchflow-ai
data-ai
open
machine-learning
946

nanogpt-training

Train GPT-2 scale models (~124M parameters) efficiently on a single GPU. Covers GPT-124M architecture, tokenized dataset loading (e.g., HuggingFace Hub shards), modern optimizers (Muon, AdamW), mixed precision training, and training loop implementation.

benchflow-ai
benchflow-ai
data-ai
open
data-engineering
946

erlang-otp-behaviors

Use when oTP behaviors including gen_server for stateful processes, gen_statem for state machines, supervisors for fault tolerance, gen_event for event handling, and building robust, production-ready Erlang applications with proven patterns.

benchflow-ai
benchflow-ai
data-ai
open
data-engineering
946

erlang-distribution

Use when erlang distributed systems including node connectivity, distributed processes, global name registration, distributed supervision, network partitions, and building fault-tolerant multi-node applications on the BEAM VM.

benchflow-ai
benchflow-ai
data-ai
open
data-analysis
946

search-attractions

Retrieve attractions by city from the bundled dataset. Use this skill when surfacing points of interest or building sightseeing suggestions for a destination.

benchflow-ai
benchflow-ai
data-ai
open
data-engineering
946

usgs-data-download

Download water level data from USGS using the dataretrieval package. Use when accessing real-time or historical streamflow data, downloading gage height or discharge measurements, or working with USGS station IDs.

benchflow-ai
benchflow-ai
data-ai
open
machine-learning
946

imc-tuning-rules

Calculate PI/PID controller gains using Internal Model Control (IMC) tuning rules for first-order systems.

benchflow-ai
benchflow-ai
data-ai
open
machine-learning
946

excitation-signal-design

Design effective excitation signals (step tests) for system identification and parameter estimation in control systems.

benchflow-ai
benchflow-ai
data-ai
open
data-analysis
946

did-causal-analysis

Difference-in-Differences causal analysis to identify demographic drivers of behavioral changes with p-value significance testing. Use for event effects, A/B testing, or policy evaluation.

benchflow-ai
benchflow-ai
data-ai
open
data-analysis
946

pca-decomposition

Reduce dimensionality of multivariate data using PCA with varimax rotation. Use when you have many correlated variables and need to identify underlying factors or reduce collinearity.

benchflow-ai
benchflow-ai
data-ai
open
data-analysis
946

trend-analysis

Detect long-term trends in time series data using parametric and non-parametric methods. Use when determining if a variable shows statistically significant increase or decrease over time.

benchflow-ai
benchflow-ai
data-ai
open
data-analysis
946

contribution-analysis

Calculate the relative contribution of different factors to a response variable using R² decomposition. Use when you need to quantify how much each factor explains the variance of an outcome.

benchflow-ai
benchflow-ai
data-ai
open
data-analysis
946

meteorology-driver-classification

Classify environmental and meteorological variables into driver categories for attribution analysis. Use when you need to group multiple variables into meaningful factor categories.

benchflow-ai
benchflow-ai
data-ai
open
data-engineering
946

senior-data-engineer

World-class data engineering skill for building scalable data pipelines, ETL/ELT systems, real-time streaming, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka, Flink, Kinesis, and modern data stack. Includes data modeling, pipeline orchestration, data quality, streaming quality monitoring, and DataOps. Use when designing data architectures, building batch or streaming data pipelines, optimizing data workflows, or implementing data governance.

benchflow-ai
benchflow-ai
data-ai
open
Previous
Page 96 / 406
Next