Making connections between different datasets is essential to developing impactful medicines and driving more personalized patient care.
Today’s research and development (R&D) pipelines are increasingly diverse, encompassing a broad range of modalities, including antibodies, proteins and peptides, cell and gene therapies, oligonucleotide therapeutics, small molecules, and antibody-drug conjugates.
Many companies have shifted from single modes of therapeutics to multimodal approaches. Scientists are now addressing targets through the most effective means to reach hard-to-treat or previously undruggable targets, aiming to develop novel treatments and counteract the growing business pressures of rising costs and high failure rates.
While this approach is promising, it presents some major challenges. Multimodal R&D data are incredibly varied and interrelated. These incompatibilities hinder workflows, stymie collaboration, waste time on systems maintenance and integration, and obscure the connections buried within the data.
Technology advances in both R&D and patient care have created a deluge of data, including instrument and experimental data, -omics data, clinical study results, patient data, patent data, publication data, etc. Making connections between these different datasets is essential to developing impactful medicines and driving more personalized patient care.
Unfortunately, all these data are often trapped in different formats and systems, making it difficult to integrate and share—a challenge made more glaring in the face of initiatives such as the FAIR Guiding Principles for scientific data management and the National Institutes for Health Data Management and Sharing Policy.
To digitally transform their R&D infrastructure, organizations need to modernize how they collect, collate, format, and model data, with the ultimate goal of amassing and correlating high-dimensional target, disease, and drug data that will help guide R&D efforts. The potential benefits include enhanced drug discovery, accelerated timelines, and improved trial design.
Using multimodal R&D, researchers can interrogate the target space from multiple angles to better understand the disease state and uncover and develop novel therapies. Well curated multimodal R&D data can be used to inform predictive and generative models that reduce development cycles and shorten time to market, such as those that identify and validate targets, screen or suggest compounds, and identify biomarkers.
Multimodal data, including small molecule descriptors, ADME-Toxicology data, transcriptomic data, text-based drug and disease representations, clinical trial protocols, publications, and patent data, can be holistically integrated to help optimize trial endpoint definitions, stratify patient subgroups, and estimate treatment effects.
Successful digital transformation in a multimodal R&D environment is an art of connecting science, data, and decision making. The key hurdles organizations face include:
As organizations adopt multimodal approaches to discover new therapies, they will need technology partners who can support diverse modalities of R&D that produce multiple unique data streams. And perhaps one of the most important foundations for multimodal R&D are flexible data models.
Simplified and adaptable data models are essential for enhancing decision-making processes. They enable quicker and more accurate analysis, leading to better-informed decisions. Flexible data models provide the agility and responsiveness needed to rapidly adapt to new requirements and technological advancements—crucial for maintaining competitiveness.
They also play a significant role in improving collaboration and reducing the technical burden on data teams. When data are easily accessible and interoperable, cross-functional teams can work together more effectively. This breaks down data silos and fosters innovation. It also leads to more efficient use of resources and lowers the likelihood of errors.
As scientific R&D divisions grow, their data needs and structures evolve. Flexible data models can easily scale to accommodate increasing volumes and complexity of data without requiring extensive overhauls.
Another key priority is to connect and integrate the instruments that produce the data. This capability allows seamless integration with a wide range of data origins, including instruments, electronic lab notebooks (ELNs), registry systems, files, and contract research organization (CRO) uploads. Scientific instruments generate vast amounts of data in different formats, including:
Once you’ve incorporated flexible data models and integrated instruments, you’re now poised to leverage advanced analytics including machine learning (ML) and artificial intelligence (AI). Advanced analytics tools empower users to find, analyze, share, and output data into workflows, specialty apps, analytics and modeling programs, AI/ML algorithms, etc. so that they can garner insights they would otherwise have missed. Insights that will help them optimize R&D efforts from the earliest days of discovery.
The result is a cohesive method for ingesting, extracting, structuring, accessing, and integrating FAIR data, which empowers scientists and alleviates the burden on IT. This collective effort reduces the time and costs associated with R&D. Ultimately, the aim is to lower the drug discovery process's expense from billions to millions of dollars, enabling life-saving therapeutics to reach patients faster and more affordably.
About the Author
Christian Olsen, Associate VP, Industry Principal, Biologics at Dotmatics.