Using ScaNN Vector Store for Efficient Similarity Search in LangChain
Posted: Jan 28, 2025.
ScaNN (Scalable Nearest Neighbors) is a powerful vector store implementation in LangChain that enables efficient similarity search at scale. In this guide, we'll explore how to use ScaNN with LangChain for document storage and retrieval.
What is ScaNN?
ScaNN is a method developed by Google Research for performing efficient vector similarity search at scale. It provides:
- Fast approximate nearest neighbor search
- Support for different distance metrics like Euclidean distance and Maximum Inner Product Search (MIPS)
- Optimized implementation for x86 processors with AVX2 support
- Search space pruning and quantization techniques for better performance
The LangChain ScaNN integration allows you to use these capabilities as a vector store for document embeddings.
Reference
Here are the key methods provided by the ScaNN vector store class:
Method | Description |
---|---|
from_documents() | Create a ScaNN index from a list of Documents |
from_texts() | Create a ScaNN index from raw text strings |
from_embeddings() | Create a ScaNN index from pre-computed embeddings |
similarity_search() | Find similar documents using text query |
similarity_search_by_vector() | Find similar documents using embedding vector |
max_marginal_relevance_search() | Find diverse similar documents using MMR |
save_local() | Save the ScaNN index to disk |
load_local() | Load a saved ScaNN index from disk |
How to use ScaNN Vector Store
Basic Setup
First, install the required dependencies:
Creating a ScaNN Index
Here's how to create a ScaNN vector store from documents:
Similarity Search
Perform similarity search to find relevant documents:
Maximum Marginal Relevance Search
Use MMR to get diverse results:
Saving and Loading
Save your ScaNN index to disk and load it later:
Using with a Retrieval Chain
Integrate ScaNN with a retrieval chain for question answering:
By using ScaNN as your vector store, you get efficient similarity search capabilities that can scale well with large document collections. The implementation is particularly well-suited for applications that need fast approximate nearest neighbor search with good accuracy.
An alternative to LangSmith
Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.
LangChain DocsJoin 10,000+ subscribers
Every 2 weeks, latest model releases and industry news.
An alternative to LangSmith
Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.