Table Of Contents:
The challenge: Multimodal data without a unifying system
A joint approach from multimodal to omnimodal
Demo 1: TCGA tertiary analysis inside Databricks
Demo 2: Large-scale single-cell analysis across both systems
Why the joint architecture of Databricks and TileDB matters
To learn more about the joint architecture between Databricks and TileDB, watch the full webinar.
TileDB + Databricks: Unifying multimodal data for precision health
In this month’s Tech Talk, Mark Lee from Databricks and Seth Shelnutt from TileDB walked through how both platforms now work together to break those silos and support more accurate and scalable precision health insights.
The challenge: Multimodal data without a unifying system
Mark opened the webinar with the core issue: Building a true longitudinal view of a patient requires far more than traditional charts and lab results. Modern pipelines need biomarkers, genomic information, IoT vitals, ADT feeds, claims, clinical notes, and even survey and digital-interaction data.
The problem is not the lack of data. It is that all of this information lives across disconnected systems and formats. This makes scientific data often slow and expensive to process. What is more, AI agents and LLMs cannot reliably operate on top of siloed environments. Teams commonly compensate with one-off scripts or shadow IT pipelines that become burdensome to maintain and can increase the risk of errors.
This makes high-impact use cases harder than they should be, including:
Pharmacogenomics and biomarker discovery
Early diagnostics
Newborn genomic screening
Disease prediction and population-level monitoring
A joint approach from multimodal to omnimodal
Seth described the shift from unimodal to multimodal to omnimodal, where an organization must govern and analyze many kinds of data through one consistent platform. Most infrastructures were not built for this, which is why TileDB and Databricks designed a joint architecture that can easily handle the full spectrum of data types.
Databricks delivers the lakehouse foundation, including governance through Unity Catalog, scalable compute with Photon and Spark, and built-in AI capabilities with Genie and agent-driven workflows. TileDB contributes an omnimodal intelligence platform that structures, organizes, governs, and analyzes complex scientific data ranging from multi-omics to imaging to large-scale single-cell datasets.
Together, both catalogs stay in sync, enabling compute to run where it makes the most sense. The result is a system that lets teams operate and collaborate without constantly moving, duplicating, or reformatting data.
Demo 1: TCGA tertiary analysis inside Databricks
Demo 2: Large-scale single-cell analysis across both systems
Seth then walked through a single-cell workflow using the CZI Cell Census stored in TileDB SOMA. After subsetting to human macrophages across multiple tissues, the dataset was converted into AnnData and pushed through a PCA to UMAP and then to clustering analysis.
This same workflow ran in multiple ways:
Locally with ScanPy
On NVIDIA Rapids for GPU acceleration
Distributed across tissues with Spark on a Photon cluster
Orchestrated through TileDB Task Graphs that submit directly into Databricks
The demo showed that teams can run large-scale multimodal analysis with consistent governance and full auditability without having to restructure or copy data.
Why the joint architecture of Databricks and TileDB matters
By combining TileDB and Databricks, organizations gain:
- 1
A governed foundation for multimodal data
- 2
The ability to run secondary and tertiary analysis without moving files
- 3
A path toward precision medicine workflows backed by unified data
- 4
Flexibility to run compute in either environment
- 5
Less fragmentation and fewer one-off pipelines
The Tech Talk closed with an invitation for teams to explore how this joint architecture can support precision health, drug discovery, and other multimodal workflows.
Assess your FAIR readiness score
Building AI-ready infrastructure starts with understanding where you are today.
Use the FAIR Readiness Scorecard below to evaluate your data maturity and map out resources to help make your data more Findable, Accessible, Interoperable and Reusable.
Meet the authors

