LangChain ApertureDB Vector Store - Store and Search Embeddings

Posted: Nov 17, 2024.

ApertureDB is a versatile database designed for storing and managing multi-modal data like text, images, videos, and embeddings along with their metadata. In this guide, we'll explore how to use ApertureDB as a vector store in LangChain for storing and searching document embeddings.

What is ApertureDB Vector Store?

The ApertureDB vector store implementation in LangChain allows you to:

  • Store document embeddings in ApertureDB descriptor sets
  • Perform similarity searches on stored embeddings
  • Support multiple vector stores within a single ApertureDB instance
  • Configure different engines and metrics for similarity search
  • Handle metadata alongside embeddings

The implementation provides both synchronous and asynchronous APIs for all operations.

Reference

Here are the key methods provided by the ApertureDB vector store:

MethodDescription
from_texts()Create a new vector store from text strings and embeddings
from_documents()Create a vector store from Document objects
add_texts()Add new texts to an existing vector store
add_documents()Add new documents to an existing vector store
similarity_search()Find similar documents based on a text query
similarity_search_with_score()Find similar documents and return relevance scores
max_marginal_relevance_search()Search with diversity optimization
delete()Remove documents from the store
list_vectorstores()List all vector stores in the database

How to Use ApertureDB Vector Store

Setting Up the Vector Store

First, initialize ApertureDB with your embeddings model:

from langchain_community.embeddings import OpenAIEmbeddings
from langchain_community.vectorstores import ApertureDB

embeddings = OpenAIEmbeddings()

# Create a vector store named "my_store"
vectorstore = ApertureDB(
    embeddings=embeddings,
    descriptor_set="my_store",
    dimensions=1536,  # OpenAI embedding dimensions
    engine="HNSW",    # Indexing engine
    metric="CS"       # Cosine similarity metric
)

Adding Documents

You can add documents in several ways:

# Add individual texts
texts = [
    "LangChain is a framework for developing applications powered by language models.",
    "ApertureDB can store and index multi-modal data efficiently."
]
vectorstore.add_texts(texts)

# Add documents with metadata
from langchain_core.documents import Document

docs = [
    Document(
        page_content="Document 1 content",
        metadata={"source": "file1.txt"}
    ),
    Document(
        page_content="Document 2 content", 
        metadata={"source": "file2.txt"}
    )
]
vectorstore.add_documents(docs)

Find similar documents using different search methods:

# Basic similarity search
docs = vectorstore.similarity_search(
    "What is LangChain?",
    k=2  # Return top 2 results
)

# Search with scores
docs_and_scores = vectorstore.similarity_search_with_score(
    "What is LangChain?",
    k=2
)

# Diversity-optimized search
diverse_docs = vectorstore.max_marginal_relevance_search(
    "What is LangChain?",
    k=2,
    fetch_k=10,     # Fetch more candidates
    lambda_mult=0.5  # Balance relevance vs diversity
)

Managing Vector Stores

ApertureDB allows you to manage multiple vector stores:

# List all vector stores
ApertureDB.list_vectorstores()

# Delete a vector store
ApertureDB.delete_vectorstore("my_store")

# Delete specific documents
vectorstore.delete(ids=["doc1", "doc2"])

Using as a Retriever

The vector store can be used as a retriever in LangChain chains:

retriever = vectorstore.as_retriever(
    search_type="mmr",  # Use MMR search
    search_kwargs={
        "k": 5,
        "fetch_k": 20,
        "lambda_mult": 0.5
    }
)

Async Operations

Most operations have async equivalents for better performance in async applications:

# Async similarity search
docs = await vectorstore.asimilarity_search("What is LangChain?")

# Async document addition
await vectorstore.aadd_documents(docs)

ApertureDB vector store provides a robust solution for storing and searching document embeddings, with support for multiple search strategies and easy integration into LangChain applications.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs