Science

Multimodal

Our multimodal approach integrates diverse data inputs, advanced modeling algorithms, and drug design to efficiently navigate vast biological and chemical space and identify the most promising drug candidates.

  • We leverage diverse data inputs including:

    • physics, chemistry and structural properties

    • biophysical assays

    • proteomics

    • patient data

    • ADME (absorption, distribution, metabolism, and excretion)

    • safety and toxicity

    • Intramolecular protein interactions

  • We utilize advanced algorithms including:

    • generative AI models (GAN, VAE, VA)

    • graph mining

    • Molecular-Geometric Deep Learning (Mol-GDL)

    • molecular docking

    • multihead attention transformers

  • We build therapeutics based on a variety of the following criteria to deliver quality lead compounds:

    • binding

    • activity

    • selectivity

    • ADME (absorption, distribution, metabolism, and excretion)

    • safety and toxicity (ames)

    • PAINS

Dissimilar & Novel Therapeutics

Our sophisticated ensemble of generative AI models, including Autoencoders (AE), Variational Autoencoders (VAE), and Generative Adversarial Networks (GAN), is designed to create novel chemical entities (NCEs) that are structurally dissimilar from the compounds in our training datasets. Additionally, ChemPrint, our Mol-GDL model, demonstrates zero-shot competency during inference, discovering compounds in novel chemical space. This dissimilarity is crucial for exploring new regions of chemical space and identifying innovative drug candidates with the potential to address unmet medical needs and drive best-in-class clinical results.

To quantify the structural novelty of our generated compounds, we employ the Tanimoto similarity score, a widely used metric in cheminformatics. The Tanimoto score ranges from 0 to 1, with lower scores indicating greater dissimilarity between compounds. The principle of molecular similarity suggests that molecules similar to potent ligands are likely also to be potent, while those similar to inactive ligands are likely to be inactive as well. Conversely, predicting the efficacy of a molecule with no close resemblance to any previously tested compound remains a challenge in computer-aided drug design. A Tanimoto similarity score above 0.85 is a common industry standard threshold for obviousness of similarity, beyond which two small molecules are expected to exhibit similar bioactivities. This threshold ensures that our generative AI models can extrapolate beyond the known chemical space, producing truly novel and diverse drug-like candidates.

Corresponding Papers and Pre-Prints

Proprietary Data

​​The GALILEO™ platform creates first-principles biochemical 'constellation' data points from 3D protein structures, harnessing an unprecedented Built-for-Purpose data learning opportunity with at least 1,000 times more data points than can be obtained from bioassays. The creation of constellations from all publicly available high quality protein structure data scales to over 500,000,000 data points with a defined path to scale to the proteome of the Tree of Life (ToL), which has an estimated 2.3 x 1014 constellation data points. This is a scale of data similar to approaches used by OpenAI, but contextual for AI drug discovery. We plan to further scale the Constellation™ data extraction process with the aid of in-house Cryo-EM to harness the largest scalable, generalizable, Built-for-Purpose first-principles biochemical dataset. This will empower Model Medicines to discover novel chemistry from deep chemical space for novel biology to solve human health.

Model Medicines' GALILEO™ AI Drug Discovery Platform employs a proprietary data pipeline that transforms publicly available explicit and implicit data into Built-for-Purpose datasets. This innovative approach involves advanced pharmacophore modeling and precise, hypothesis-driven data mining techniques to identify and extract relevant data from primary literature sources. By leveraging the expertise of our team of biochemists and bioinformaticians, we ensure the highest quality and accuracy in our data extraction process. This meticulous contextualization of data enables us to uncover valuable implicit information that is often overlooked by traditional data mining methods.

Our proprietary data pipeline outperforms commercial datasets utilized by strategic global pharmaceutical companies. Results demonstrate that GALILEO™ displays a 194% increase in data sources, a 1541% increase in QSAR bioactivities, a 320% increase in biology coverage, a 467% increase in unique chemical structures, and a 334% increase in potent bioactivities compared to commercial benchmarks. These Built-for-Purpose datasets unlock the full potential of our AI downstream processes, providing a competitive edge over other AI Drug Discovery platforms.

Corresponding Papers

Models

Model Medicines employs a sophisticated ensemble of AI models within its GALILEO™ platform to accelerate drug discovery. Our approach leverages both generative AI models and zero-shot Mol-GDL machine learning techniques to identify novel, high-potential drug candidates.

Our generative AI models, including Autoencoders (AE), Variational Autoencoders (VAE), and Generative Adversarial Networks (GAN), are designed to explore vast chemical space and propose novel molecular structures with desirable pharmacological properties. These models learn from diverse data inputs, such as biophysical assays and chemical properties, to generate innovative compounds that are dissimilar to known molecules yet possess drug-like characteristics. By harnessing the power of generative AI, we can efficiently navigate the immense landscape of potential therapeutic compounds and identify promising candidates for further development.

In parallel, our machine learning models, CHEMPrint™ and Constellation™, play crucial roles in our drug discovery pipeline. CHEMPrint™, our Mol-GDL model, leverages Quantitative Structure-Activity Relationship (QSAR) data to predict the binding affinity and activity of compounds against specific protein targets. This model is trained on carefully curated, built-for-purpose datasets that outperform commercial benchmarks, enabling us to identify potent, selective, and synthetically accessible compounds. Constellation™, on the other hand, learns from the intricate atomic interactions within protein structures derived from X-ray crystallography and Cryo-EM data. By analyzing the biochemical interactions that govern protein folding and function, Constellation™ can predict novel ligand-protein binding modes and guide the design of compounds that target specific protein sites. The combination of CHEMPrint™ and Constellation™ allows us to efficiently prioritize and optimize lead compounds, ultimately accelerating the discovery of life-changing medicines.

Corresponding Papers and Pre-Prints

Approach

We recognize that AI is a powerful tool, but it is not the end goal. Our unwavering focus remains on discovering life-changing medicines that can make a real difference in patients' lives. While we constantly strive to enhance our AI models and expand our proprietary datasets, we never lose sight of our ultimate purpose: solving disease and improving patient outcomes. 

The GALILEO™ platform is designed to accelerate the drug discovery process and identify novel, best-in-class compounds with the highest potential for clinical success. However, we understand that metrics and technical performance alone do not translate into meaningful therapeutic impact. By prioritizing the discovery of potent, selective, and synthetically accessible compounds for novel or previously undruggable targets, we aim to bring forth a new generation of medicines that can transform the treatment landscape and offer hope to those in need. At the core of our approach lies a deep commitment to patients, driving us to push the boundaries of AI-powered drug discovery and deliver tangible solutions to the world's most pressing health challenges.

Corresponding Papers and Pre-Prints