Treeverse Releases lakeFS 1.0: Bringing Stability, Security, and Performance to Data Lake Version Control

Treeverse, the creators of the open-source lakeFS data version control system, have announced the release of lakeFS 1.0. This major update introduces production-level stability, security, and performance enhancements to the data lake version control software.

The Evolution of lakeFS

The lakeFS project was established in 2020 and has since undergone continuous improvement, providing organizations with an open-source solution for version control of object storage-based data stored in data lakes. Treeverse, the leading company behind the technology, secured a funding of $23 million in 2021 to develop the concept, delivering capabilities similar to the popular open-source Git version control system but for data lakes.

Expansion to the Cloud and Integration with Other Technologies

In 2022, Treeverse introduced the lakeFS cloud offering, a managed cloud service for data version control. This approach has garnered interest from notable enterprises including Lockheed Martin, Volvo, and Arm. Additionally, lakeFS 1.0 now has the ability to integrate with other data lake technologies such as Databricks and Apache Iceberg, which are increasingly being adopted by cloud data vendors.

“We have a large base of installations and really a product that reflects what people need for data version control over a data lake,” said Einat Orr, Co-founder and CEO at Treeverse, in an exclusive interview with VentureBeat.

Data version control allows users to track changes to data over time, similar to how version control systems like Git track changes to code. However, lakeFS takes this concept further by extending it to the world of data stored in data lakes. It provides a comprehensive version control experience across an organization’s entire data lake, including data pipelines and workflows.

“Our vision is to be the version control tool that is running over all your data sources, and providing you the ability to version control your data pipelines, no matter where the data is,” Orr explained.

Moreover, lakeFS stores metadata about each version and changes, crucial for reproducibility and integration. It is important to note that Treeverse positions lakeFS as a complementary technology, offering added benefits to users in conjunction with other tools and technologies like Databricks, Apache Iceberg, and data orchestration tools such as Apache Airflow, Prefect, and Dagster.

Diverse Use Cases and Future Perspectives

The lakeFS technology has diverse applications in data analytics and AI. For instance, data scientists can utilize lakeFS to version data locally for model development and testing purposes through the new lakeFS local capability.

“Our vision is to be the version control tool that is running over all your data sources, and providing you the ability to version control your data pipelines, no matter where the data is,” Orr said.

Orr further mentioned that Treeverse is exploring possibilities to integrate and enable data version control capabilities for vector database technologies. This expansion would enable users to have complete control over their data pipelines, regardless of the data’s location.

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts