Back

Jul 20, 2023

TileDB newsletter - July 2023

Newsletters
5 min read
Mike Broberg

Mike Broberg

Technical Marketing Manager

Hello!

Summer is in full swing, and we have been busy publishing more learning content here on the blog. Check out the full summary further below, but now, on to the tech updates!

TileDB Cloud

New ways to organize and preview data.

Asset type categories

There are schema design patterns in life sciences, earth observation, and other domains that regularly use TileDB arrays to optimize the higher-level data types found in their research. Now, these higher-level types are automatically categorized in TileDB Cloud's left navigation bar. Users still have access to generic TileDB Cloud data types — arrays, files, notebooks, UDFs, etc. — as well as grouped datasets.

TileDB-Cloud_asset-types.png

Interactive image previews

Raster data can now be interactively previewed in TileDB Cloud. Each record's preview tab lets you pan, zoom, blend color channels, and see tile caching information to help users quickly explore imaging datasets under Geospatial / Rasters and Life Sciences / Biomedical Imaging.

TileDB Cloud VCF improvements

We're working to make TileDB-VCF code on TileDB Cloud more convenient and less verbose by providing new one-liner functions for VCF ingestion and distributed queries. Here's a preview:

import tiledb.cloud.vcf as vcf
import tiledbvcf

# Initialize the TileDB config with AWS credentials (truncated)
config = {...}

# Set the URI of the VCF dataset
dataset_uri = "s3://bucket/prefix/vcf-dataset"

# One-line distributed VCF ingestion
dag, sample_uris = vcf.ingest(
    dataset_uri,
    config=config,
    search_uri="s3://1000genomes-dragen-v3.7.6/data/individuals/hg38-graph-based",
    pattern="*.vcf.gz",
    max_files=10,
)

# Wait for the ingestion to complete
dag.wait()

# Get a list of samples in the dataset
ds = tiledbvcf.Dataset(dataset_uri, tiledb_config=config)
samples = ds.samples()

TileDB Embedded

Highlights from version 2.16.0.

LiDAR compression

The 2.16.0 release features performance improvements to the existing set of compressors and filters for TileDB arrays, and it adds a new option for delta compression. This compressor will be used as part of a compression pipeline (in TileDB APIs or through PDAL profiles) offering TileDB compression that is equal to or better than LAZ.

Query condition improvements

Query conditions (QCs) on array attributes now support negation of the entire set of conditions via the NOT operator. Previously, QCs supported the not-equal-to operator (!=) on individual conditions; however, the new NOT operator negates the entire statement. Also, as part of general performance improvements to dense arrays and local file systems, QCs can now be applied to the dimensions of dense arrays.

Updated Azure & GCP integration

Core to TileDB's commitment to maintaining an open, cloud-native array engine, we recently updated TileDB Embedded to support the latest SDKs for Azure and Google Cloud. In addition to taking advantage of the usual cloud provider security and bug fixes, the new Azure SDK also provides TileDB support for Azure's premium block blob storage for high-performance workloads.

TileDB open libraries

News from the TileDB open-source ecosystem.

TileDB-Vector-Search

TileDB-Vector-Search provides an open-source Python API for storage and search of vector embeddings, built on top of the TileDB array engine. As part of the TileDB open-source ecosystem, TileDB-Vector-Search is cloud-native, with support for all TileDB backends (AWS S3, Azure Blob Storage, Google Cloud Storage). With TileDB-Vector-Search, it is now possible to store, process, and query data for the entire lifecycle of a vector search project in one unified system — everything from the raw data used for training (as TileDB arrays), to training and fine-tuning with TileDB Cloud task graphs, to indexing and retrieval of embeddings!

vector_search.png

Geometry support for spatial queries

To complement existing TileDB support for raster data via GDAL and point cloud data via PDAL, TileDB now supports geometry-based queries as a vector driver for GDAL. Additionally, the TileDB-MariaDB integration also supports query pushdown for common spatial operations like ST_INTERSECT and ST_CONTAINS.

Geometries are stored in TileDB using the well-known binary (WKB) format as defined by the Open Geospatial Consortium (OGC) and take advantage of novel indexing techniques that use TileDB's existing R-tree structures. Look for more examples on TileDB geometries coming soon on the blog!

TileDB_geometries-support.png

TileDB-BioImaging napari plugin

You can now visualize microscopy images stored as TileDB arrays with the napari n-dimensional image viewer. The napari-tiledb-bioimg plugin supports reading TileDB-BioImaging multi-resolution arrays within napari.

TileDB-SOMA R API

The TileDB-SOMA R API is fast approaching its 1.0 release, featuring the ability to import and export Seurat objects to and from SOMA experiments. Please try out the integration and send us your feedback!

tiledb-czi-SOMA_github_banner1.jpg

TileDB in action

TileDB content, events, and other happenings.

New how-to articles

There's a range of new introductory content published to the TileDB Blog!

Our take on data mesh & AI

In our latest webinar, we rethink how a modern database system that uses arrays as its foundation can morph to support data mesh implementations that unify tabular and complex data, generative AI, and data products.


This newsletter packed quite the punch. Thank you for reading!

If you'd like to share product feedback, simply reply to this email, join our Slack community, or follow us on Twitter and LinkedIn. We'd love to hear about your TileDB experience and future requirements.

Thank you,

— The TileDB Team

Want to see TileDB Cloud in action?
Mike Broberg

Mike Broberg

Technical Marketing Manager