The emergence of natural language processing in pharmacovigilance
Getting critical treatments approved and delivered to market is no easy feat. Currently, only 15% of drugs successfully make it from clinical trials to U.S. FDA approval—with nearly 75% of failures attributed to safety and efficacy concerns. Once drugs do make it to market, keeping them there requires in-depth pharmacovigilance (PV) practices.1
Adverse event (AE) reporting is a key area undergoing major changes as new sources of information become available. Now, patients are directly sharing their experiences with drugs. These patient-reported outcomes are shared in unstructured formats, including social media, doctors’ notes captured in electronic health records, call centers, and from more structured information coming from clinical trials.
Pharmaceutical companies are seeking opportunities for digital transformation to capture these additional unstructured, natural language-based data sources. The goal is to marry these new disparate data sources with information from more traditional structured sources. All these sources have data that was commonly ignored due to the complexity associated with accessing and analyzing them. The potential for tapping into rich safety insights is expansive but not without its challenges—with 80% of healthcare data currently residing in unstructured formats.2
While the industry has traditionally been reticent to adopt artificial intelligence (AI) technology in compliance, new regulations and expectations are shifting many viewpoints on the promise and power of automation—from a perceived risk to the status quo to a vital business tool.
Natural language processing (NLP)3 is one such tool that biopharma companies are adopting, particularly as it pertains to PV and AE identification.
The hurdles of gathering data from patient-reported outcomes
Taking in and processing patient-reported outcomes comes with major challenges; first, the sheer amount of data that needs to be processed can seem monumental, and, second, performing quality checks on that data can be equally time-intensive.
The availability of health data has exploded in recent years—by 878% since 20164, according to statistics compiled by Dell EMC in 2019. This growing volume and complexity—combined with growing expectations from regulators to perform PV activities in near real-time to boost patient safety—has made analysis and reporting of data a near-impossible undertaking for humans to perform without any NLP/tech-related assistance. Put simply, humans cannot accomplish this alone—there are simply not enough trained professionals available to manage all the data that needs to be processed, sorted, and shared at the speed that is needed for modern drug development.
Further, all data, once processed, must be quality-checked to verify that nothing has been missed and that no inadequate or irrelevant information makes its way into a report. Quality assurance (QA) practices have historically been significantly time-intensive for pharma professionals. Traditionally, QA professionals must read every individual document and manually extract critical insights. Even with productivity hacks like keyword searches, it can take hours to contextualize patient-reported outcomes—often requiring knowledge and memory of complex rules across various therapeutic areas to accurately spot an adverse event.
Operating in this way is unproductive in terms of driving broader innovation and value. Sixty percent or more of time and resources spent on PV activities are currently dedicated to data extraction and quality checking. This estimate does not account for time spent on staff training—let alone troubleshooting to ensure data interpretation consistency across staff members and handling evolving reporting requirements.
Enter natural language processing
NLP is a technological operation by which unstructured text-based documents are mined and converted into structured information that can subsequently be analyzed by a computer. It is not a new technology for the pharmaceutical industry: there have been strong use cases pointing to the value of NLP in recent years, particularly in clinical development, such as patient recruitment for clinical trials (through the evaluation of data available from electronic medical records and previous clinical studies). In one example working to identify patient cohorts for a clinical study, Drexel University was able to increase relevant identified patients by 67% while reducing manual review by 80% and overall time spent from five months to less than a week.
In these instances, evaluation criteria for finding prospects for trials remain relatively static. However, AE monitoring is a different story. Evaluating the accuracy, severity, and relevance of one consumer’s mention of an unexpected side effect from a commercially available drug, for example, is much harder to qualify and contextualize than feedback from a patient engaged in a clinical trial. Therefore, in working off of patient-reported outcomes, the context and chronology of input become critical differentiators when determining whether a specific drug caused an AE and needs to be reported to authorities—or whether that reported outcome was an outlying, unique case in which the AE was misreported, misunderstood, or influenced by other external factors.
The intricacies of natural language must be taken into consideration, such as slang and exaggeration, especially in looking to social media and other online repositories for this information. For instance, if a patient reports a bad night’s sleep, it would not necessarily be interpreted or properly codified as sleeplessness, and thus reported as an AE.
Further, near real-time processing is critical to make the best use of NLP data for AE monitoring. NLP technology must process data and flag relevant AE information, moving with the speed of a computer processing unit while “thinking” more complexly to better assist humans. It has taken years of refining and honing algorithms to get to this point of accuracy and consistent reporting with NLP.
Today, NLP can solve the problems of context and timeliness. Its true power lies in the ability to combine and compare decades of static legacy data (such as previously published medical literature) with constantly updating patient data, which it captures and processes in near real time. By cross-referencing text with billions of pieces of legacy medical data and the Medical Dictionary for Regulatory Activities (MedDRA)5, NLP is able to standardize and report potential AEs with a high degree of accuracy. In one example, CSL Behring, the rare disease drug developer, was able to double its accurate auto-coding of AE from 30% with one-to-one verbatim text matching alone, to more than 60% with NLP technology.
The future of NLP for compliance and beyond
NLP technology for AE monitoring is still in its nascent stages. However, NLP’s inherent value is starting to resonate with the pharmaceutical industry.
NLP becomes still more important as we think about the trajectory of additional data sources likely available in the years to come. For example, biometric data from wearable devices is already starting to be captured. These data sets are anticipated to grow significantly as more people adopt wearable devices and as more applications leveraging wearable devices with patient input become available. With more available data sources, we can expect the quality of manually processed data to become more challenging to accurately track without NLP assistance. Here, NLP becomes even more important for continuously solving data processing challenges and quality-related issues.
Looking further ahead, other related emerging technologies in the industry, like AI and machine learning, will start to be more fully integrated with NLP. With that, we may start to see how AI tools can help to determine the level of scrutiny a given AE should receive and how that factors into a product’s risk profile.
Looking beyond compliance, a newfound understanding of natural language data sources, enabled by NLP, will offer biopharma companies a valuable competitive edge in identifying new opportunities for clinical development. Through the same channels that may lead to uncovering AEs, companies may discover entirely new indications for which their products prove effective. This value will undoubtedly be even more important for organizations looking to spearhead innovative new treatments such as personalized medicine and cell and gene therapy.
Ultimately, NLP will become an essential tool for PV departments to help them continuously and consistently meet the rising pressures of compliance, while effectively managing cost and complexity. Freeing up these resources from manual reporting challenges will enable companies to reinvest in activities that drive true business value, with the ability to analyze the depth and breadth of previously untapped data to fuel future R&D activities.
NLP has the potential to help manufacturers drive toward fulfilling the industry's most important value: improving the safety and quality of treatments available today, while advancing the promise of untapped opportunities on the horizon.
About the authors
Updesh Dosanjh, Practice Lead, IQVIA Vigilance Platform, IQVIA; Jane Reed, Head of Life Science Strategy, Linguamatics, an IQVIA company
References