Using LangChain Cassandra Vector Store for Document Storage and Retrieval
Posted: Feb 10, 2025.
The Cassandra vector store in LangChain provides a way to store and search documents using Apache Cassandra® or compatible databases that support vector search capabilities. In this guide, we'll explore how to use this vector store implementation effectively.
What is Cassandra Vector Store?
The Cassandra vector store is a LangChain integration that allows you to store documents and their vector embeddings in Apache Cassandra® or compatible databases (like Astra DB). It supports various search capabilities including:
- Vector similarity search
- Metadata filtering
- Hybrid search (combining vector similarity with text search)
- Maximal marginal relevance (MMR) search
The implementation requires Cassandra 5.0+ or a compatible database that supports vector capabilities.
Reference
Key methods of the Cassandra vector store:
Method | Description |
---|---|
add_texts() | Add raw text documents with optional metadata and IDs |
add_documents() | Add Document objects with metadata |
similarity_search() | Search for similar documents by text query |
similarity_search_with_score() | Similar to above but includes relevance scores |
max_marginal_relevance_search() | Search optimizing for both relevance and diversity |
delete() | Remove documents by their IDs |
clear() | Empty the entire vector store |
metadata_search() | Search documents by metadata filters |
How to Use Cassandra Vector Store
Setup and Initialization
First, install the required package:
There are two ways to initialize the vector store:
- Using a Cassandra cluster:
- Using Astra DB:
Adding Documents
You can add documents in two ways:
Searching Documents
Basic similarity search:
Using MMR search for diversity:
Hybrid Search
If using Astra DB, you can combine vector similarity with text search:
Managing Documents
Delete specific documents:
The Cassandra vector store provides a robust solution for document storage and retrieval, with support for advanced features like hybrid search and MMR. Its integration with Cassandra and Astra DB makes it suitable for production deployments requiring scalability and high availability.
Remember to properly handle the database connection and clean up resources when you're done. The vector store capabilities will depend on your underlying database version and configuration.
An alternative to LangSmith
Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.
LangChain DocsJoin 10,000+ subscribers
Every 2 weeks, latest model releases and industry news.
An alternative to LangSmith
Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.