Jun 13, 2025

A conversation with Spencer Seale: Why bioinformaticians deserve to use tools they enjoy

Genomics

Data Management

Data Science

10 min read

Devika Garg

Director, Life Sciences Product Marketing

TileDB was founded by people who love data, those who see the possibility of multimodal data and build technology to unlock this potential. For bioinformaticians like Spencer Seale, TileDB is enabling a better future for life sciences research by solving many of the chronic problems he faced in this industry. In the following interview, Seale walked us through his perspective on how TileDB empowered biotech research and the capabilities that directly address today’s bioinformatics challenges.

The promise of next-gen sequencing and joining TileDB

Let’s start with you. Why did you study and make your career in bioinformatics?

Seale: After I studied biology in undergrad, I became really interested in next generation sequencing technologies and started working in the wet lab. There I organically learned about bioinformatics from working with a lot of bioinformaticians and saw all the interesting technologies they were using and the key role that they played in the entire research project. With the explosion of new sequencing technologies like Illumina and more cloud compute available, I saw an opportunity where we needed a lot of people who were trained to work with all these large amounts of data.

This showed me the future of life sciences research was going to happen through bioinformatics. I studied at University of Oregon for my masters because I really enjoy working with these challenging problems. After that, I joined Adaptive Biotechnologies and developed software infrastructure to help them develop assays that could sequence T-cell receptors and use those sequences to identify diseases that a person was exposed to. That company used this technology to create COVID-19 diagnostics and antigen tests during the pandemic.

What led you to join TileDB as a Solutions Architect and Bioinformatics Engineer?

Seale: I found TileDB organically while trying to solve an ingest problem. I was looking at different embedded databases that could help me take separate files and bring them all together in a new format so I could efficiently query everything. With TileDB, I could create an embedded database and write all those files into one single destination, and then within that destination I could write specific metadata, then store all that information as different attributes within that database. So rather than having hundreds of thousands of separate files and having to build some system on top of that, I could compile all that data into one single point through easy-to-use programmatic APIs and from that destination database I could query it and pull out the specific points of interest. TileDB was so useful for what I was doing, so I kept an eye on the organization and applied for my role as soon as it opened.

Tabular databases are the past. What’s the future of data science?

How well do you feel traditional tabular databases are handling the frontier data needs of today’s life sciences research? What shortcomings do you see?

Seale: So besides the ingesting issue I was trying to solve, the big shortcoming with tabular databases is they require a lot more pre-processing for data like single-cell, population genomics and bioimaging. A database with three billion rows takes a lot of computing to handle any queries. And considering how ubiquitous cloud resources are these days, we're seeing a lot of companies that think they can maintain their current data systems by adding new pieces for specific formats while spending more on memory, CPUs, GPUs and the like. This makes their databases more expensive and less efficient. At TileDB we’re examining the actual data model and providing a better way. I see this as an opportunity for researchers to have a much more enjoyable experience with data storage and access.

“Enjoyable” isn’t a word we usually hear when talking about data science. Why did you choose that word?

Seale: Bioinformaticians use programming languages and libraries in those languages every day. You want to use tools that you enjoy the syntax of, and everyone has favorite packages and libraries that they’re comfortable filtering data from. The experience of accessing the data matters, and bioinformaticians deserve to use tools they enjoy and can do their best work on.

I see TileDB providing this kind of experience where you don't have to develop some in-house library to know where to look for all your files. TileDB eliminates the pre-planning needed to know how to structure the naming scheme of those files. You create your TileDB array one time and then following that schema of that array, you can then ingest all the data into TileDB without having to write specific file names or worry about any of the caveats that might exist for all the potential pathways people might follow to access that data.

So if TileDB is providing a more enjoyable, better way, why do you think traditional tabular databases still dominate life sciences research?

Seale: There’s a lot of inertia to change. A lot of computational teams are focused on their research rather than wanting to pivot to restructuring all of their data. At TileDB we address this with easy-to-use ingestion modes to transform data in its original structure to a TileDB array-based structure. But it is still a new way of doing things. And I believe people will eventually feel that kind of burn that will make them look for new technologies.

Let’s talk about feeling that burn. What are some signs that a life sciences firm is beginning to outgrow their current database approach?

Seale: A big sign and pain point I see with a lot of researchers is they have no system of record to understand where all their data is. Often they want to go back to their united single source of truth and discover what data they created years ago, then analyze that side by side with data they've created today. But if they have no unified system of record to understand where their data is, then they have to start messaging colleagues and passing files through a OneDrive or email or Slack to get that information. In that case, they’re essentially pulling the data from its source and then sharing it, which has obvious usability and privacy concerns.

This is why one of the big bioinformatics value adds of TileDB is taking all your data and putting it in a secure, unified and searchable location. With that data collapsed into a single array, TileDB has the means to quickly understand what's in that array and to do a high level query to see what you're working with.

How the multi-dimensional array tackles bioinformatics problems

You’ve mentioned some of the benefits of TileDB’s multi-dimensional arrays. In your view, what makes multi-dimensional arrays uniquely suited for managing genomics, single cell or other frontier data in life sciences?

Seale: Think about how quickly the landscape of the life sciences field and genomics is changing. We are seeing a lot of exciting discoveries, but these require managing very large data volumes in a wide variety of formats. It’s difficult to continually create database software that supports every new file format, especially if they were invented by the life sciences group inside one company or academic institution.

Take Variant Call Format (VCF) files, which are important for population genomics research. Many academic institutions have developed software that’s specific to VCF files, but that means the software isn’t going to work with single-cell data or other formats. Every time a new key file format emerges, you will need new tools and technologies to deal with it—unless you’re using a multi-dimensional array. An array can be architected to store any type of file or data structure, flexing to fit anything from a table to complex imaging resolution sets. This avoids endless software development chasing new file formats while making storage much easier.

That does sound like it would make ingesting new data dramatically simpler. So the TileDB array brings structure to all kinds of data types, even those traditionally considered unstructured?

Seale: Yes. With arrays, you could create a file and write any type of data within that without having to define data type. So when you're creating an array, you are defining a structure in which that data should exist, with specific attributes. This means you can't write to that array with data that isn't supposed to be there. And so when you look at data that’s not in tabular structure, it could still have many different values in it. You can take that data, then ingest the particular parts of it that you want into a structured database that requires users to abide by that array’s schema.

TileDB arrays also follow a columnar format, so you can specify a specific dimension or a specific attribute and the array will ignore all of the other attributes or dimensions that aren’t of interest. And so you're essentially ignoring those files as they sit in your cloud storage and just reading the data that you actually want to study. TileDB's data catalog takes all of your different data types and aggregates them in one searchable location that can be searched by metadata that's attached directly to the asset itself.

How TileDB is driving discovery in life sciences

That would make searching large data volumes much easier. What else does TileDB do to drive discovery in life sciences?

Seale: Enhancing collaboration is another key goal of TileDB. Besides all the ways we facilitate search, we also support many different kinds of monitoring logs for all that data. So anytime you want to know whether someone's accessed some data that's stored in TileDB at a very low level, whether it's a query read, a metadata update or what have you, you can see who's accessed all of that data.

TileDB also offers teamspaces, where you can invite a different collection of your colleagues to share views of different data, and when you invite them you assign them to a specific permission level like read-only or read/write or read/write/delete to control how they access the data. Once you invite them to that team space, they get access to the data you want them to see and collaborate with them on. And they can use that by authenticating through TileDB directly without having to have any cloud credentials to access that data. And that's really critical to be able to quickly collaborate, create new data, and share it with all of your colleagues in one data catalog. You can create any number of teamspaces, like one for a particular project, one for all your single-cell research and so on. This lets you instantly see totally different collections of data in one secure location.

Are there any specific advantages that the new TileDB Carrara offers to make bioinformaticians’ work easier and more effective?

Seale: One that I find really useful is Carrara’s ability to essentially mount an S3 bucket as a file system. So once you connect your S3 or cloud storage to TileDB, you can traverse that like it’s a local file system using Carrara. So you have access to all of your data synced through TileDB and you can just read them like they would be a file system.

That removes all of the barriers of connecting to different cloud providers, like the authentication layer, and allows you to easily work with your data as it currently exists. Then once it's there, you can start to realize the potential of restructuring that data into performant multi-dimensional arrays for even more improved access to that data. I think what makes TileDB unique is our investment in performance. We've dedicated a lot of resources to developing a really scalable system that works well in the long term. It's been difficult to develop an entirely new database, but this sets us apart from competitors that are just focusing on things like adding a beautiful user interface to current systems. In contrast, TileDB really trims down the infrastructure to create a completely novel and future-focused way of doing things.

Learn more about TileDB Carrara and how it drives life science discovery.

Want to see TileDB in action?

Devika Garg

Director, Life Sciences Product Marketing

For multimodal AI to bear fruit, biopharma teams need an unshakeable multimodal data foundation

Last month, I had the privilege of moderating our Sips & Sequences panel on "Revolutionizing Drug Discovery with Mac...

Devika Garg

Director, Life Sciences Product Marketing

Multimodal Data Is At the Heart of Your Next Big Breakthrough

If you’re like most leaders and builders who use technology to get work done, you are attending conferences and hearing ...

Stavros Papadopoulos

Founder and CEO, TileDB

1 min read

Stay connected

Get product and feature updates.

Loading form...

By subscribing you agree with TileDB, Inc. Terms of use.
Your personal data will be processed in accordance with TileDB's Privacy Policy.By subscribing you agree with TileDB, Inc. Terms of use. Your personal data will be processed in accordance with TileDB's Privacy Policy.