Generative AI has primarily relied on static sources of data, but what if organizations want to leverage real-time streaming data? This is where the LangStream open source project, led by DataStax, comes into play. Launched on September 13, LangStream has quickly evolved, with a new release expanding integration points and enhancing its usefulness.
The Power of LangStream
The LangStream project enables developers to seamlessly work with streaming data sources, also known as data in motion, to construct event-driven architectures. Event-driven architectures involve triggering actions based on incoming data points from a stream. Real-time applications heavily rely on event-driven architectures to leverage data as it arrives on a platform. This allows generative models to consider the latest contextual data when formulating responses or completing tasks.
“LangStream is a way to build generative AI applications in an event-driven way.”
Chris Bartholomew, Head of Streaming Engineering at DataStax
LangStream extends its compatibility beyond DataStax’s AstraDB database, now integrating with vector databases like Milvus and Pinecone. This integration opens up possibilities for Retrieval Augmented Generation (RAG) operations, where generative AI models can reference up-to-date data.
The Inner Workings of LangStream
LangStream operates on a stream processing model, processing incoming messages or events before transmitting them. One key aspect of LangStream is its synchronization with vector database technologies. As new data enters the model for RAG, a vector embedding must be generated to allow integration with vector databases. LangStream facilitates the creation of embeddings through a synchronous data pipeline, supporting various vector embedding models, such as those hosted on Hugging Face and Google’s Vertex AI.
“A lot of what we’re doing is taking the pipeline streaming, event-driven paradigm and we’re taking it to GenAI applications.”
Chris Bartholomew, Head of Streaming Engineering at DataStax
LangStream remains agnostic when it comes to specific vector embedding models, offering flexibility to users. By leveraging real-time streaming data, LangStream empowers developers to create scalable and production-ready AI applications across diverse data types.
“LangStream can greatly benefit developers working with generative AI as it helps them to easily build applications and simplifies the process of coordinating data from a variety of sources to enable high-quality prompts for LLMs.”
Davor Bonaci, CTO and Executive Vice President of DataStax
As an open source project, LangStream aligns with DataStax’s commitment to collaborating with open source communities. This approach supports the development of technologies core to DataStax’s commercial efforts, such as Apache Pulsar and the Apache Cassandra database.
“DataStax has a long history of working with open source communities. It only seems fitting to contribute to yet another open source project, especially one that is so relevant to developers working with today’s most popular technologies.”
Davor Bonaci, CTO and Executive Vice President of DataStax