Why Multimodal Data is Growing in Pharma

July 26, 2024

Commentary

Article

Making connections between different datasets is essential to developing impactful medicines and driving more personalized patient care.

Image credit: NicoElNino | stock.adobe.com

Today’s research and development (R&D) pipelines are increasingly diverse, encompassing a broad range of modalities, including antibodies, proteins and peptides, cell and gene therapies, oligonucleotide therapeutics, small molecules, and antibody-drug conjugates.

Many companies have shifted from single modes of therapeutics to multimodal approaches. Scientists are now addressing targets through the most effective means to reach hard-to-treat or previously undruggable targets, aiming to develop novel treatments and counteract the growing business pressures of rising costs and high failure rates.

While this approach is promising, it presents some major challenges. Multimodal R&D data are incredibly varied and interrelated. These incompatibilities hinder workflows, stymie collaboration, waste time on systems maintenance and integration, and obscure the connections buried within the data.

Behind the Data Evolution

Technology advances in both R&D and patient care have created a deluge of data, including instrument and experimental data, -omics data, clinical study results, patient data, patent data, publication data, etc. Making connections between these different datasets is essential to developing impactful medicines and driving more personalized patient care.

Unfortunately, all these data are often trapped in different formats and systems, making it difficult to integrate and share—a challenge made more glaring in the face of initiatives such as the FAIR Guiding Principles for scientific data management and the National Institutes for Health Data Management and Sharing Policy.

Benefits of Multimodal R&D

To digitally transform their R&D infrastructure, organizations need to modernize how they collect, collate, format, and model data, with the ultimate goal of amassing and correlating high-dimensional target, disease, and drug data that will help guide R&D efforts. The potential benefits include enhanced drug discovery, accelerated timelines, and improved trial design.

Using multimodal R&D, researchers can interrogate the target space from multiple angles to better understand the disease state and uncover and develop novel therapies. Well curated multimodal R&D data can be used to inform predictive and generative models that reduce development cycles and shorten time to market, such as those that identify and validate targets, screen or suggest compounds, and identify biomarkers.

Multimodal data, including small molecule descriptors, ADME-Toxicology data, transcriptomic data, text-based drug and disease representations, clinical trial protocols, publications, and patent data, can be holistically integrated to help optimize trial endpoint definitions, stratify patient subgroups, and estimate treatment effects.

Challenges of Managing Multimodal Data

Successful digital transformation in a multimodal R&D environment is an art of connecting science, data, and decision making. The key hurdles organizations face include:

Data Volume and Complexity: Managing structured, semi-structured and unstructured data, sequence data, chemical structures, numeric data, text, images, and corresponding metadata. All must be properly processed and stored; if it is not easily findable, accessible, or (re)usable, its value plummets.
Interoperability and Integration: Data flows in from a wide range of lab instruments, equipment, and systems, making its collation a technical and administrative nightmare. These data are generally not inherently compatible or easily integrated, and typically produce data in different—often proprietary—formats, making it difficult to model and correlate.
Data Quality and Governance: Ensuring data accuracy, consistency, and integrity can be difficult, where cross-functional teams are working with different specialty tools and workflows. Time-consuming and error-prone movement of data between different systems can be avoided with a connected R&D cloud platform that centralizes and standardizes complex data at scale, readies it for downstream use, and provides tools for data management and governance.

As organizations adopt multimodal approaches to discover new therapies, they will need technology partners who can support diverse modalities of R&D that produce multiple unique data streams. And perhaps one of the most important foundations for multimodal R&D are flexible data models.

Data Model Yoga

Simplified and adaptable data models are essential for enhancing decision-making processes. They enable quicker and more accurate analysis, leading to better-informed decisions. Flexible data models provide the agility and responsiveness needed to rapidly adapt to new requirements and technological advancements—crucial for maintaining competitiveness.

They also play a significant role in improving collaboration and reducing the technical burden on data teams. When data are easily accessible and interoperable, cross-functional teams can work together more effectively. This breaks down data silos and fosters innovation. It also leads to more efficient use of resources and lowers the likelihood of errors.

As scientific R&D divisions grow, their data needs and structures evolve. Flexible data models can easily scale to accommodate increasing volumes and complexity of data without requiring extensive overhauls.

Connection

Another key priority is to connect and integrate the instruments that produce the data. This capability allows seamless integration with a wide range of data origins, including instruments, electronic lab notebooks (ELNs), registry systems, files, and contract research organization (CRO) uploads. Scientific instruments generate vast amounts of data in different formats, including:

Structured: Organized in a defined manner, such as databases and spreadsheets.
Semi-structured: Partially organized data such as JSON, XML, and CSV files.
Unstructured: Data without a predefined structure, such as text documents, emails, PowerPoint slides, and social media posts.
Sequence: Biological sequences such as DNA, RNA, and protein sequences.
Numeric: Quantitative data that can be statistically analyzed.
Text: Written content that requires natural language processing for analysis.
Image: Visual data from microscopy, medical imaging, and other sources.
Metadata: Data that provides information about other data, enhancing its usability and context.

Readying for the Benefits of AI

Once you’ve incorporated flexible data models and integrated instruments, you’re now poised to leverage advanced analytics including machine learning (ML) and artificial intelligence (AI). Advanced analytics tools empower users to find, analyze, share, and output data into workflows, specialty apps, analytics and modeling programs, AI/ML algorithms, etc. so that they can garner insights they would otherwise have missed. Insights that will help them optimize R&D efforts from the earliest days of discovery.

The result is a cohesive method for ingesting, extracting, structuring, accessing, and integrating FAIR data, which empowers scientists and alleviates the burden on IT. This collective effort reduces the time and costs associated with R&D. Ultimately, the aim is to lower the drug discovery process's expense from billions to millions of dollars, enabling life-saving therapeutics to reach patients faster and more affordably.

About the Author

Christian Olsen, Associate VP, Industry Principal, Biologics at Dotmatics.