Using Annoy Vector Store in LangChain
Posted: Feb 11, 2025.
The Annoy (Approximate Nearest Neighbors Oh Yeah) vector store is a powerful tool for performing efficient similarity search on document embeddings. In this guide, we'll explore how to use the Annoy vector store in LangChain for various document search use cases.
What is Annoy?
Annoy is a C++ library with Python bindings created by Spotify that implements approximate nearest neighbor search. It's particularly useful when you need to find similar documents in a large collection by comparing their vector embeddings. Some key characteristics of Annoy include:
- Read-only after building the index (cannot add new documents incrementally)
- Memory-mapped file format that allows sharing index across processes
- Fast search performance with approximate nearest neighbors
- Support for different distance metrics like angular (cosine) and euclidean
Reference
Here are the main methods available in the Annoy vector store:
Method | Description |
---|---|
from_texts() | Create vector store from a list of texts |
from_documents() | Create vector store from a list of Document objects |
from_embeddings() | Create vector store from pre-computed embeddings |
similarity_search() | Find similar documents using text query |
similarity_search_by_vector() | Find similar documents using embedding vector |
similarity_search_with_score() | Get similar documents with similarity scores |
save_local() | Save the index and documents to disk |
load_local() | Load a previously saved index and documents |
How to Use Annoy
Let's explore different ways to use the Annoy vector store:
Creating a Vector Store
You can create an Annoy vector store in several ways:
Performing Similarity Search
You can search for similar documents in different ways:
Saving and Loading
Annoy indexes can be saved to disk and loaded later:
Advanced Configuration
You can customize Annoy's behavior with different parameters:
Working with Metadata
You can include metadata with your documents:
Important Notes
-
Annoy is read-only after building the index. If you need to add documents incrementally, consider using a different vector store.
-
The similarity scores returned are distances - lower scores indicate more similar documents.
-
The default metric is "angular" which corresponds to cosine similarity, but you can also use "euclidean" or "dot" product.
-
When saving indexes, be cautious with the
allow_dangerous_deserialization
parameter as it could pose security risks if loading untrusted files.
By using Annoy in LangChain, you can efficiently implement similarity search for your document collections while benefiting from its fast search capabilities and memory-efficient design.
An alternative to LangSmith
Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.
LangChain DocsJoin 10,000+ subscribers
Every 2 weeks, latest model releases and industry news.
An alternative to LangSmith
Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.