Tech Talk

TileDB Carrara x Snowflake: From fragmented to unified data | December 11th Learn more

News

TileDB x Databricks Partner to Power Multimodal Data for Agentic AI in Healthcare + Life Sciences. Read the news

3 min read

Data Management
Genomics
Single Cell

Beyond Multimodal to Omnimodal Intelligence to Drive Outcomes in Healthcare & Life Sciences

Originally published: Nov 25, 2025

Table Of Contents:

The challenge: Multimodal data without a unifying system

A joint approach from multimodal to omnimodal

Demo 1: TCGA tertiary analysis inside Databricks

Demo 2: Large-scale single-cell analysis across both systems

Why the joint architecture of Databricks and TileDB matters

To learn more about the joint architecture between Databricks and TileDB, watch the full webinar.

TileDB + Databricks: Unifying multimodal data for precision health

Healthcare and life sciences teams are generating more multimodal data than ever: genomics, imaging, clinical records, telemetry, exposures, and population-scale datasets. But these modalities rarely live in one place or share the same format. The result is a fragmented ecosystem that increases costs and slows down everything from early diagnostics to precision medicine to newborn screening and disease prediction.

In this month’s Tech Talk, Mark Lee from Databricks and Seth Shelnutt from TileDB walked through how both platforms now work together to break those silos and support more accurate and scalable precision health insights.

The challenge: Multimodal data without a unifying system

Mark opened the webinar with the core issue: Building a true longitudinal view of a patient requires far more than traditional charts and lab results. Modern pipelines need biomarkers, genomic information, IoT vitals, ADT feeds, claims, clinical notes, and even survey and digital-interaction data.

The problem is not the lack of data. It is that all of this information lives across disconnected systems and formats. This makes scientific data often slow and expensive to process. What is more, AI agents and LLMs cannot reliably operate on top of siloed environments. Teams commonly compensate with one-off scripts or shadow IT pipelines that become burdensome to maintain and can increase the risk of errors.

This makes high-impact use cases harder than they should be, including:

  • Pharmacogenomics and biomarker discovery

  • Early diagnostics

  • Newborn genomic screening

  • Disease prediction and population-level monitoring

A joint approach from multimodal to omnimodal

Seth described the shift from unimodal to multimodal to omnimodal, where an organization must govern and analyze many kinds of data through one consistent platform. Most infrastructures were not built for this, which is why TileDB and Databricks designed a joint architecture that can easily handle the full spectrum of data types.

Databricks delivers the lakehouse foundation, including governance through Unity Catalog, scalable compute with Photon and Spark, and built-in AI capabilities with Genie and agent-driven workflows. TileDB contributes an omnimodal intelligence platform that structures, organizes, governs, and analyzes complex scientific data ranging from multi-omics to imaging to large-scale single-cell datasets.

Together, both catalogs stay in sync, enabling compute to run where it makes the most sense. The result is a system that lets teams operate and collaborate without constantly moving, duplicating, or reformatting data.

Demo 1: TCGA tertiary analysis inside Databricks

The first demonstration focused on analyzing TCGA datasets shared from TileDB into Databricks using Delta Sharing. Once mounted, the SOMA datasets were added to a Genie workspace, where Mark asked natural-language questions such as how age varies across tumor grades or diagnoses. Genie then generated the SQL and visuals automatically, showing how multimodal datasets can be explored interactively without manual preparation.

Demo 2: Large-scale single-cell analysis across both systems

Seth then walked through a single-cell workflow using the CZI Cell Census stored in TileDB SOMA. After subsetting to human macrophages across multiple tissues, the dataset was converted into AnnData and pushed through a PCA to UMAP and then to clustering analysis.

This same workflow ran in multiple ways:

  • Locally with ScanPy

  • On NVIDIA Rapids for GPU acceleration

  • Distributed across tissues with Spark on a Photon cluster

  • Orchestrated through TileDB Task Graphs that submit directly into Databricks

The demo showed that teams can run large-scale multimodal analysis with consistent governance and full auditability without having to restructure or copy data.

Why the joint architecture of Databricks and TileDB matters

By combining TileDB and Databricks, organizations gain:

  1. 1

    A governed foundation for multimodal data

  2. 2

    The ability to run secondary and tertiary analysis without moving files

  3. 3

    A path toward precision medicine workflows backed by unified data

  4. 4

    Flexibility to run compute in either environment

  5. 5

    Less fragmentation and fewer one-off pipelines

The Tech Talk closed with an invitation for teams to explore how this joint architecture can support precision health, drug discovery, and other multimodal workflows.

Assess your FAIR readiness score

Building AI-ready infrastructure starts with understanding where you are today.

Use the FAIR Readiness Scorecard below to evaluate your data maturity and map out resources to help make your data more Findable, Accessible, Interoperable and Reusable.

Download the FAIR Readiness quiz here.

Meet the authors