end-to-end AI drug discovery

GALILEO™ AI Platform

Abstract visualization of the GALILEO™ AI drug discovery platform integrating biological data, modeling, and drug design

Model Medicines' GALILEO™ platform utilizes a multimodal, end-to-end AI architecture that simultaneously processes diverse biological and chemical data to navigate expansive molecular spaces, enabling the targeting of elusive "undruggable" points.

Multimodal

Our multimodal approach integrates diverse data inputs, advanced modeling algorithms, and drug design to efficiently navigate vast biological and chemical space and identify the most promising drug candidates.

01

Multimodal Data

02

Multimodal Modeling

03

Multimodal Drug Design

By incorporating multiple modalities - spanning data types, modeling techniques, and drug design - we are able to effectively navigate chemical space to discover novel, high-quality drug candidates

Multimodal Data

​​The GALILEO™ platform creates first-principles biochemical 'constellation' data points from 3D protein structures, harnessing an unprecedented Built-for-Purpose data learning opportunity with at least 1,000 times more data points than can be obtained from bioassays. The creation of constellations from all publicly available high quality protein structure data scales to over 500,000,000 data points with a defined path to scale to the proteome of the Tree of Life (ToL), which has an estimated 2.3 x 10^14 constellation data points. This is a scale of data similar to approaches used by OpenAI, but contextual for AI drug discovery. We plan to further scale the Constellation™ data extraction process with the aid of in-house Cryo-EM to harness the largest scalable, generalizable, Built-for-Purpose first-principles biochemical dataset. This will empower Model Medicines to discover novel chemistry from deep chemical space for novel biology to solve human health.

Model Medicines' GALILEO™ AI Drug Discovery Platform employs a proprietary data pipeline that transforms publicly available explicit and implicit data into Built-for-Purpose datasets. This innovative approach involves advanced pharmacophore modeling and precise, hypothesis-driven data mining techniques to identify and extract relevant data from primary literature sources. By leveraging the expertise of our team of biochemists and bioinformaticians, we ensure the highest quality and accuracy in our data extraction process. This meticulous contextualization of data enables us to uncover valuable implicit information that is often overlooked by traditional data mining methods.

Our proprietary data pipeline outperforms commercial datasets utilized by strategic global pharmaceutical companies. Results demonstrate that GALILEO™ displays a 194% increase in data sources, a 1541% increase in QSAR bioactivities, a 320% increase in biology coverage, a 467% increase in unique chemical structures, and a 334% increase in potent bioactivities compared to commercial benchmarks. These Built-for-Purpose datasets unlock the full potential of our AI downstream processes, providing a competitive edge over other AI Drug Discovery platforms.

Multimodal Modeling

Model Medicines employs a sophisticated ensemble of AI models within its GALILEO™ platform to accelerate drug discovery. Our approach leverages both generative AI models and zero-shot Mol-GDL machine learning techniques to identify novel, high-potential drug candidates.

Our generative AI models, including Autoencoders (AE), Variational Autoencoders (VAE), and Generative Adversarial Networks (GAN), are designed to explore vast chemical space and propose novel molecular structures with desirable pharmacological properties. These models learn from diverse data inputs, such as biophysical assays and chemical properties, to generate innovative compounds that are dissimilar to known molecules yet possess drug-like characteristics. By harnessing the power of generative AI, we can efficiently navigate the immense landscape of potential therapeutic compounds and identify promising candidates for further development.

In parallel, our machine learning models, CHEMPrint™ and Constellation™, play crucial roles in our drug discovery pipeline. CHEMPrint™, our Mol-GDL model, leverages Quantitative Structure-Activity Relationship (QSAR) data to predict the binding affinity and activity of compounds against specific protein targets. This model is trained on carefully curated, built-for-purpose datasets that outperform commercial benchmarks, enabling us to identify potent, selective, and synthetically accessible compounds. Constellation™, on the other hand, learns from the intricate atomic interactions within protein structures derived from X-ray crystallography and Cryo-EM data. By analyzing the biochemical interactions that govern protein folding and function, Constellation™ can predict novel ligand-protein binding modes and guide the design of compounds that target specific protein sites. The combination of CHEMPrint™ and Constellation™ allows us to efficiently prioritize and optimize lead compounds, ultimately accelerating the discovery of life-changing medicines.

Multimodal Drug Design

Our sophisticated ensemble of generative AI models, including Autoencoders (AE), Variational Autoencoders (VAE), and Generative Adversarial Networks (GAN), is designed to create novel chemical entities (NCEs) that are structurally dissimilar from the compounds in our training datasets. Additionally, ChemPrint, our Mol-GDL model, demonstrates zero-shot competency during inference, discovering compounds in novel chemical space. This dissimilarity is crucial for exploring new regions of chemical space and identifying innovative drug candidates with the potential to address unmet medical needs and drive best-in-class clinical results.

To quantify the structural novelty of our generated compounds, we employ the Tanimoto similarity score, a widely used metric in cheminformatics. The Tanimoto score ranges from 0 to 1, with lower scores indicating greater dissimilarity between compounds. The principle of molecular similarity suggests that molecules similar to potent ligands are likely also to be potent, while those similar to inactive ligands are likely to be inactive as well. Conversely, predicting the efficacy of a molecule with no close resemblance to any previously tested compound remains a challenge in computer-aided drug design. A Tanimoto similarity score above 0.85 is a common industry standard threshold for obviousness of similarity, beyond which two small molecules are expected to exhibit similar bioactivities. This threshold ensures that our generative AI models can extrapolate beyond the known chemical space, producing truly novel and diverse drug-like candidates.

See how we apply the GALILEO™ Platform to Biological Choke Points