TileDB Cloud is a universal database platform that manages all your data and code assets, and enables fast and collaborative analysis. In this blog post I will walk you through the steps you need to take to get started with TileDB Cloud in just a few minutes.
To take full advantage of the TileDB Cloud capabilities, you will need to have an AWS account and set up an S3 bucket. TileDB Cloud manages compute and offers Jupyter notebooks with some small EBS storage dedicated for you, but it does not host your data and code. Your assets remain under your full ownership and control in your own cloud storage.
Note that you don’t need an AWS account to start exploring TileDB Cloud and the plethora of public data and code that can be found on the platform, or data and code that your friends or colleagues would like to share with you. You need an AWS account only if you’d like to create and manage your own assets.
In the rest of this article, I describe those three modes in more detail. To accompany these instructions, we also prepared a video that runs through all of the steps below in well under 4 minutes. So read on, or check out the recording here:
As a new user, you’ll land directly in the TileDB Cloud Explore tab, which allows you to mainly see public assets, browse their metadata, inspect an array schema, render a static preview of a Jupyter notebook, etc. You won’t be able to programmatically read public arrays, register your own TileDB Cloud arrays, run interactive Jupyter notebooks, or experiment with many other features until you advance to Read Mode.
New users are eligible to unlock $10 in free credits, without even entering a credit card. To claim your free credits and enter the Read Mode, you just need to tell us a little bit about your use case. To start the process, head to your user account icon in the upper right and find the Unlock $10 credits button.
This button will open a modal window, where you’ll need to tell us a little about the type of data you’re interested in (for example, “VCF files”, “LAS data”, “biomedical images in various file formats”, etc). You’ll also need to describe your high-level use case.
It’s important to note that requests for credits can take up to one business day to process. A member of the TileDB Cloud team personally reviews each request. Once your request has been approved, your account will be in Read Mode.
You can check on the status of your credits request. Again under your user account, find the Settings button and navigate to the Billing page. Any free credits added to your account will appear under the Current bill widget, displaying your Credits balance in blue.
After confirming that your account is in Read Mode, head back to the Explore tab and browse the Groups section to access Tutorials / Python / Dataframes / dataframes_basics.ipynb. Select the notebook, and you’ll be able to Launch it directly.
In a few seconds, your notebook will launch within the TileDB Cloud console, and you’ll be able to run cells and experiment with your own code.
You can run this notebook completely in Read Mode. This notebook also demonstrates an important distinction about data storage. When you launch a Jupyter notebook on TileDB Cloud, it is backed by a 2 GB Amazon EBS volume dedicated to your user account. Typically, this space is allocated to allow users to upload custom packages to install into the environment or store relatively small data. Your home directory can be found in the JupyterLab file browser.
When you are done with the tutorial, it is a best practice to shut down the notebook server using the button in the upper right.
Complete Mode requires linking TileDB Cloud to your cloud provider account. In my example, I’ll be using AWS.
First, from your user account menu, select the Account credentials button.
Choose the + Add credentials button and a modal window will appear to enter your cloud provider access key ID and secret. This will allow you to enter the AWS credentials related to the various S3 buckets you will be using to store your arrays and other assets to be managed by TileDB Cloud.
The AWS docs have the details on managing access keys. You can quickly access them via the AWS console.
Now that TileDB Cloud can talk to my AWS account, I just need to tell it where to store my arrays and other assets by default. Navigate to your Profile page and scroll down to the Storage path section. Go ahead and select the credentials you just created.
For the last step, I’ll pop back into the AWS console and access the S3 object storage service. It’s as simple as choosing your bucket and hitting the Copy S3 URI button. (If you’re new to AWS, however, their docs can help you set up Amazon S3).
Paste in the S3 URI, and that’s it. Effectively you have told TileDB Cloud the default S3 bucket you will be using to create the various assets (e.g., Jupyter notebooks, UDFs, etc) and the corresponding AWS credentials that will allow TileDB Cloud to gain access to that S3 bucket. TileDB Cloud Complete Mode is successfully enabled!
Now, you can upload your own notebooks and files to TileDB Cloud, register pre-existing TileDB arrays to TileDB Cloud, or programmatically create and register new TileDB Cloud arrays as you go. These assets, and more, are stored in the object storage path that you provided in the steps above. You host this data and retain all control even if you ever decide not to use TileDB Cloud in the future (all your assets are stored in the TileDB open-source format, which is always accessible via the open-source TileDB Embedded library).
From here, you should be able to explore all TileDB Cloud has to offer using your free credits. Feel free to poke around the TileDB Cloud console on your own. But for further reading, here are a couple of resources that helped me get started in my TileDB journey:
Try creating and registering your first TileDB Cloud array, and then slicing it right from the same notebook. It works like magic!