Using Weaviate Vector Store in LangChain

Posted: Nov 21, 2024.

The Weaviate vector store integration in LangChain provides a powerful way to store and search vector embeddings. This guide will walk you through using the Weaviate vector store class and its key features.

What is Weaviate Vector Store?

Weaviate is an open-source vector database that allows you to store data objects alongside their vector embeddings. The LangChain Weaviate integration provides a wrapper around the Weaviate client that makes it easy to:

Store documents and their embeddings
Perform semantic similarity searches
Filter and retrieve documents based on metadata
Manage documents with CRUD operations

Reference

Here are the key methods provided by the Weaviate vector store class:

Method	Description
from_texts	Create a Weaviate instance from a list of texts and embeddings
from_documents	Create a Weaviate instance from a list of Documents
add_texts	Add new texts and their embeddings to the store
add_documents	Add new documents to the store
similarity_search	Find similar documents using semantic search
similarity_search_with_score	Get similar documents with relevance scores
max_marginal_relevance_search	Use MMR to get diverse similar documents
delete	Remove documents by ID

How to Use Weaviate Vector Store

Basic Setup and Initialization

First, you'll need to install and initialize the Weaviate client:

from langchain_community.vectorstores import Weaviate
from langchain_openai import OpenAIEmbeddings
import weaviate

# Initialize the embedding model
embeddings = OpenAIEmbeddings()

# Create a Weaviate client 
client = weaviate.Client(url="http://localhost:8080")

# Initialize Weaviate vector store
vectorstore = Weaviate(
    client=client,
    index_name="Documents",  # Name of the index to create
    text_key="content",      # Key to store the text
    embedding=embeddings,    # Embedding model
)

Adding Documents

You can add documents in two ways:

# Method 1: From raw texts
texts = [
    "LangChain provides large language model tools",
    "Vector stores are important for LLM applications"
]
metadatas = [
    {"source": "docs", "type": "guide"},
    {"source": "blog", "type": "post"}
]

vectorstore.add_texts(texts=texts, metadatas=metadatas)

# Method 2: From Document objects
from langchain_core.documents import Document

docs = [
    Document(
        page_content="LangChain tutorial content",
        metadata={"source": "tutorial", "author": "langchain"}
    )
]

vectorstore.add_documents(docs)

Performing Searches

Weaviate supports different types of searches:

# Basic similarity search
docs = vectorstore.similarity_search(
    query="What is LangChain?",
    k=4  # Number of documents to return
)

# Search with relevance scores
docs_and_scores = vectorstore.similarity_search_with_relevance_scores(
    query="What is LangChain?"
)

# Maximum Marginal Relevance (MMR) search for diversity
diverse_docs = vectorstore.max_marginal_relevance_search(
    query="What is LangChain?",
    k=4,               # Number of documents to return
    fetch_k=20,        # Number of documents to fetch before filtering
    lambda_mult=0.5    # Diversity factor (0=max diversity, 1=max similarity)
)

Using Filters and Metadata

You can filter search results using metadata:

# Search with metadata filter
filtered_docs = vectorstore.similarity_search(
    query="LangChain guide",
    k=4,
    filter={
        "source": "docs",
        "type": "guide"
    }
)

Document Management

# Delete documents by IDs
vectorstore.delete(ids=["doc1", "doc2"])

# Get documents by IDs
docs = vectorstore.get_by_ids(ids=["doc1", "doc2"])

Using as a Retriever

The Weaviate vector store can be used directly as a retriever in LangChain chains:

retriever = vectorstore.as_retriever(
    search_type="similarity",  # or "mmr"
    search_kwargs={
        "k": 4,
        "score_threshold": 0.8
    }
)

This guide covers the main functionality of the Weaviate vector store in LangChain. The integration provides a robust foundation for building semantic search and retrieval systems with additional features like asynchronous operations (prefixed with 'a') for all main methods.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs