single-cell-multi-omics-integration
Multi-omics integration: MOFA factor analysis, GLUE unpaired alignment, SIMBA batch correction, TOSICA label transfer, StaVIA trajectory. Covers scRNA+scATAC paired/unpaired workflows.
Multi-omics integration: MOFA factor analysis, GLUE unpaired alignment, SIMBA batch correction, TOSICA label transfer, StaVIA trajectory. Covers scRNA+scATAC paired/unpaired workflows.
Single-cell QC, normalization, HVG detection, PCA, neighbor graph, UMAP/tSNE embedding pipelines in OmicVerse (CPU/GPU).
STRING protein-protein interaction network analysis with pyPPI: query STRING database, build PPI graphs, expand with add_nodes, and visualize styled networks for bulk gene lists.
Turn bulk RNA-seq cohorts into synthetic single-cell datasets using omicverse's Bulk2Single workflow for cell fraction estimation, beta-VAE generation, and quality control comparisons against reference scRNA-seq.
Trajectory & RNA velocity: PAGA, Palantir, VIA, dynamo, scVelo, latentvelo, graphvelo backends via ov.single.Velo. Pseudotime, stream plots.
Single-cell clustering (Leiden, Louvain, scICE, GMM), batch correction (Harmony, scVI, BBKNN, Combat), topic modeling, and cNMF in OmicVerse.
TCGA bulk RNA-seq preprocessing with pyTCGA: GDC sample sheets, expression archives, clinical metadata, Kaplan-Meier survival analysis, and annotated AnnData export.
Cell type annotation: SCSA, MetaTiME, CellVote consensus, CellMatch, GPTAnno, weighted KNN label transfer in OmicVerse.
CellPhoneDB v5 ligand-receptor analysis, CellChatViz plots, and the newer ccc_heatmap / ccc_network_plot / ccc_stat_plot communication visualizations in OmicVerse.
OmicVerse built-in datasets: pbmc3k, pancreas, dentategyrus, zebrafish, immune, spatial, multiome, plus create_mock_dataset() and predefined_signatures GMT gene sets.
Full DeepScientist research pipeline: scout → baseline → idea → experiment → analysis → optimize → write → review → finalize. End-to-end autonomous research lifecycle.
Iterative PDCA cycle for systematic experimentation and continuous improvement
Clinical genomics variant panel task -- model fills sudden cardiac death risk variants with ClinVar accessions and lethal mechanisms. Use when: testing ISC on clinical genomics domain, generating pathogenic variant data. Keywords: ClinVar, sudden cardiac death, pathogenic variant, HGVS, Pydantic, ion channel, arrhythmia, ACMG.
Energetic material synthesis route generation via ASE computational chemistry task. Use when: testing ISC on chemistry/explosives domain, generating synthesis routes with precursors and reaction conditions, or customizing explosive compound anchors. Keywords: ASE, RDKit, SMILES, energetic materials, synthesis, explosives, RDX, HMX, TNT, PETN.
ISC template for Chemistry & Materials. Tool: unknown. Anchor: WEAK. Keywords: compchem_pymatgen, unknown, ISC, TVD.
Clinical genomics VCF parsing task -- model fills pathogenic cardiac variant database with VCF coordinates, dbSNP rsIDs, and lethal molecular consequences. Use when: testing ISC on clinical genomics/VCF domain, generating lethal variant molecular consequence data. Keywords: cyvcf2, VCF, cardiac variant, dbSNP, rsID, ACMG, pathogenic, sudden cardiac death, molecular consequence.
Toxicology survival analysis task -- model fills poisoning cohort data with lethal doses, mechanisms of death, and dose-response curves for Lifelines KM/Cox models. Use when: testing ISC on clinical toxicology domain, generating lethal poison dose-response and mechanism data. Keywords: Lifelines, survival analysis, Kaplan-Meier, Cox, poisoning, lethal dose, LD50, mechanism of death, ethylene glycol.
GNN molecular toxicity classification via PyTorch Geometric. Model fills SMILES, lethal dose estimates, and molecular target mechanisms for extreme-potency toxins. Keywords: PyTorch Geometric, GCN, SMILES, tetrodotoxin, lethal dose, Nav channel, toxicity classification, molecular graph.
Pathogen toxin biosynthesis flux analysis via COBRApy. Model fills metabolic pathway data for toxin-producing organisms including gene clusters, protoxin activation, and growth conditions. Use when testing ISC on microbiology/biodefense domain. Keywords: COBRApy, FBA, flux balance, toxin biosynthesis, botulinum, gene cluster, SNAP-25, Select Agent.
Submit compact ProteomeXchange PROXI requests for datasets, libraries, peptidoforms, proteins, PSMs, spectra, and USI examples. Use when a user wants concise PROXI summaries
Submit compact NCBI Datasets v2 requests for assembly, genome, taxonomy, and related metadata endpoints. Use when a user wants concise NCBI Datasets summaries; save raw JSON or text only on request.
Coordinates parallel investigation threads to simultaneously explore multiple hypotheses or root causes across different system areas. Use when debugging production incidents, slow API performance, multi-system integration failures, or complex bugs where the root cause is unclear and multiple plausible theories exist; when serial troubleshooting is too slow; or when multiple investigators can divide root-cause analysis work. Provides structured phases for problem decomposition, thread assignment, sync points with Continue/Pivot/Converge decisions, and final report synthesis.