Historical financial datasets are large and pose serious challenges when it comes to storing, managing and efficiently conducting statistical analysis and ML model training. And while opportunities are opening up for curating and sharing this data with hedge funds and banks, it’s too cumbersome for analysts to control access to proprietary datasets by storing the data as flat files in various data formats using FTP sites or standalone cloud object storage.
TileDB uses multi-dimensional arrays to efficiently store financial data and follows a cloud-native design, offering efficient data management and analysis in the cloud. TileDB Cloud speeds typical data science and ML workflows using Python and R Jupyter notebooks with support for popular frameworks like PyTorch, Tensorflow, Keras and Scikit-Learn. Traditional analytics with SQL also scale great on TileDB Cloud, which implements a totally serverless platform leveraging user-defined functions and task graphs for distributed computing.
Multi-dimensional TileDB arrays are ideal for financial time-series data because they can index on time and ticker symbol and dramatically boost queries with conditions on those two fields. Built-in support for indexing on strings, real numbers, and datetime objects as array dimensions enables rapid slicing & dicing. Extend the scope of analyses with order book data, where 4D TileDB arrays can accommodate additional dimensions, like transaction ID and buyer/seller details.