Webinar

Introducing TileDB Carrara: Power multimodal data to drive AI-enabled discovery - September 25 1:00 PM EST. Register

News

TileDB x Databricks Partner to Power Multimodal Data for Agentic AI in Healthcare + Life Sciences. Read the news

8 min read

Data Management
Genomics
Single Cell

The state of multimodal data in biotech and pharma 2025

Originally published: Aug 26, 2025

Table Of Contents:

Takeaway 1: Multimodal data is now essential to life sciences R&D strategy

Takeaway 2: Primary challenges of using multimodal data in life sciences research are complexity and governance/compliance

Takeaway 3: Three multimodal data management platform must-haves

How TileDB Carrara helps life sciences researchers master multimodal data 

Life sciences data has always been inherently multimodal. Consider how a single cell can yield data that crosses modalities ranging from transcriptomics to spatial imaging to metabolomics. Zooming out to the level of human health, we can add data types from population genomics, clinical trial results, remote monitoring sensors and EHRs. 

Complex? Absolutely. Full of exciting potential? Without a doubt.

These are not the isolated conclusions of one product marketer, but a clear consensus from the life sciences industry. We at TileDB learned this when we ran an in-depth survey on multimodal data and its research applications with 142 leaders in biopharma across R&D IT, Data and AI and project teams who attended BiotechX Fall 2024, the Festival of Genomics and Biodata in January 2025 and the AWS Healthcare and Life Sciences Symposium in May 2025. We also conducted one-on-one interviews with 10 research leads at top biopharma firms to learn how they were using multimodal data in their R&D.

In this blog post, we will review high-level findings from these surveys and interviews as well as explore how TileDB Carrara is addressing the issues and opportunities of multimodal data in life sciences. 

Takeaway 1: Multimodal data is now essential to life sciences R&D strategy

84.5% of our respondents consider using multimodal data in life sciences R&D strategy as both important and urgently needed. This shows overwhelming agreement that multimodal data analysis has become a cornerstone of biotech and pharma research. Our interviews with research experts both reinforced this finding and offered useful insights explaining why multimodal data is now so important. 

“In the therapeutic areas where pharma is most focused, like immunology and oncology, I would confidently say that 80% of these data are multimodal. Because the problems we are trying to address are such that you need to generate this multimodal data to find biomarkers and to find your niche of patients compared to your competitors,” said a Director of Data and Computational Sciences at Sanofi. “The real value of leveraging this multimodal data is in target discovery.”

A Director of Data and AI at AstraZeneca voiced similar thoughts, adding that their organization relied on multimodal AI to understand MoA and design better clinical trials. “I do AI modeling multimodal data integration for cancer research,” he said. “The idea is to inform a better understanding of the mechanism of action for the compounds in our pipeline. And to figure out what would be the best target patient population, how we can help with the patient stratification for future clinical trial designs.”

Beyond relying on multimodal data analysis for research purposes, life sciences experts are also excited about the clinical potential of multimodal data. “Right now, multimodal data is being used as exploratory research and discoveries,” said a Kite Pharma Senior Scientist. “But the objective would be to have a point of care instrument by the bedside of a patient where you can right away get blood or the tumor biopsy, put in the instrument, and then know how and what to treat the patient with.”

Takeaway 2: Primary challenges of using multimodal data in life sciences research are complexity and governance/compliance

In spite of the potential they see in multimodal data, our survey respondents agreed on two of its significant challenges: Complexity and governance. 90.5% said it’s moderately to very hard to store and catalog different data modalities side-by-side, and 88% are dissatisfied with how current solutions handle governance and manage compliance of their data. These challenges in creating a useful single data fabric from many structured and unstructured modalities also appeared in our one-on-one interviews.

Some experts we interviewed voiced concerns that their current technology was not able to effectively navigate these challenges. “We have quite a few technologies in house. The problem is we cannot put all data together in one database because we don't have such a platform. So it's somewhat disjointed how we do our analysis,” said a Senior Director at Abbvie.

Takeaway 3: Three multimodal data management platform must-haves

As life sciences research leaders look for data management platforms for their multimodal data, our survey identified three must-have technology requirements: 

  • 52.1% of respondents need their data management platform to work with data of all kinds including large scale omics, assays, clinical trials, literature and biomedical images.

  • 31.7% require their data management platform to have built-in capabilities for data science and machine learning.

  • 30.3% need their data management platform to be able to connect all data assets for a project into a single space: data such as omics data, assays, literature, ancillary measurements and reports they exchange with each other.

Our interviews echoed many of these needs while adding specific wants around computing efficiency and flexible usability. “Data volume is a big challenge, and data complexity is the cherry on top of that. So once you start processing this data, you need to have a lot of computational power,” said a Senior Principal of Research Informatics at Novartis. “To play with large amounts of data like high dimensional multimodal data sets requires very good infrastructure at the back end and also requires a lot of advanced computational tools.” 

A Novo Nordisk Principal Scientist described their ideal data platform as highly usable even for researchers who were not fluent in cloud management: “So in a word the problem is scale. Once you've chosen multi omics, you just make this problem worse by adding more dimensions of data. So one thing that would be really nice is a data structure that is curable in a way where the person making the query does not necessarily have to understand compute at terabyte scale.” 

An Abbvie Senior Director envisioned a data platform built to harmonize data at scale: “In the ideal world, we would have a data-type-agnostic platform that would be able to ingest data sets from different sources. We would be able to manage this in a proprietary way. We could buy data from the same company, we could buy data from different companies and we could also generate our own data then upload it and view and analyze our data in the context of these larger external data sets.”

How TileDB Carrara helps life sciences researchers master multimodal data 

Listening to life sciences research leads in surveys and interviews helped lay the conceptual foundation for TileDB’s new platform, TileDB Carrara. By organizing, structuring, easing collaboration on and analyzing multimodal data in life sciences, we are building Carrara as a true multimodal lakehouse: Managing the data lake where multimodal data flows in, then serving as a flexible warehouse to facilitate research teams’ target discovery and collaboration. This will not only help biotech and pharma organizations master the complexity of multimodal data, but also simplify the governance and compliance challenges of managing multimodal data at scale.


As you prepare your organization for the multimodal future, take a look at our guide to multimodal data in life sciences to explore its applications, challenges and possibilities.

Meet the authors