The Power of Voltron Data and Theseus Distributed Query Engine

In the world of robotics, the fictional Voltron robot from the popular animated show shares similarities with Voltron Data – a company that combines open source technologies to enhance data access. Voltron Data has recently unveiled Theseus, a distributed query engine that aims to revolutionize data queries for AI workloads.

Theseus is designed to accelerate large-scale data pipelines and queries using GPUs and other hardware accelerators. By utilizing modular and composable accelerated libraries, Theseus aims to optimize data systems and provide faster results. According to Josh Patterson, co-founder and CEO of Voltron Data, Theseus is their next product in the journey of becoming the leading designer and builder of data systems.

One of the main goals of Theseus is to accelerate ETL, feature engineering, and other data preparation work to feed downstream AI and analytics systems at a faster rate. As AI systems demand more real-time data transformation, Theseus aims to address this issue by providing faster data access. Patterson highlights the significance of this feature, as many users have expressed their challenges in getting data fast enough for their AI systems.

Typically, data queries are limited by CPU compute capacity and performance. Theseus overcomes this limitation by leveraging accelerated computing technologies, including GPUs, to run queries faster than traditional CPU-based distributed engines. Patterson emphasizes the advantage of an “accelerator native” approach, as it allows faster query processing at scale.

Hyper parameter optimization is one specific use case where Theseus excels. Organizations can fine-tune parameters and perform feature engineering to build better models. By improving the speed of feature engineering, ETL processes, and data ingestion, Theseus empowers users to enhance their AI models and outcomes.

Theseus follows open standards such as Apache Arrow, Apache Parquet, and Ibis for interoperability. It is not designed as a proprietary siloed system but rather embraces compatibility with Apache Arrow-compatible data lakes. Data can be seamlessly moved in and out of Theseus, allowing integration with various machine learning tools and frameworks like PyTorch, Tensorflow, and graph databases.

While Theseus itself does not have its own front-end user interface, it supports SQL queries and can be paired with Ibis for mapping other front ends. This flexibility enables organizations to easily integrate Theseus into their existing workflows and systems.

Voltron Data has partnered with Hewlett Packard Enterprise (HPE) to bring Theseus to the HPE GreenLake hybrid cloud platform. This collaboration offers the infrastructure for Theseus while providing customers with unified queries across different engines using Ibis. Voltron Data plans to expand Theseus partnerships and enhance functionality, such as user-defined functions, to achieve tighter integration into full data science pipelines in the future.

As data processing demands continue to grow, Theseus and Voltron Data are at the forefront of leveraging the power of distributed query engines to meet the evolving needs of AI-driven industries.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts