Using PGVecto_rs Vector Store in LangChain

Posted: Jan 30, 2025.

The PGVecto_rs vector store allows you to store and search document embeddings using PostgreSQL with the pgvecto.rs extension. In this guide, we'll explore how to use this vector store in LangChain for document storage and similarity search.

What is PGVecto_rs?

PGVecto_rs is a PostgreSQL-based vector store that leverages the pgvecto.rs extension to enable vector similarity search capabilities. It allows you to:

Store document embeddings in PostgreSQL tables
Perform similarity searches using different distance metrics
Filter search results based on metadata
Support both synchronous and asynchronous operations

Reference

Here are the key methods provided by the PGVecto_rs class:

Method	Description
`from_documents()`	Creates a new vector store from a list of documents
`from_texts()`	Creates a new vector store from a list of texts
`from_collection_name()`	Connects to an existing vector store collection
`similarity_search()`	Performs similarity search with optional filtering
`similarity_search_with_score()`	Returns documents with similarity scores
`add_documents()`	Adds new documents to the store
`add_texts()`	Adds new texts to the store
`delete()`	Deletes documents by ID

How to Use PGVecto_rs

Setting up the Vector Store

First, you need to set up your PostgreSQL connection and initialize the vector store:

from langchain_community.vectorstores.pgvecto_rs import PGVecto_rs
from langchain_community.embeddings import OpenAIEmbeddings

# Configure database connection
db_url = "postgresql+psycopg://user:password@localhost:5432/dbname"

# Initialize embeddings
embeddings = OpenAIEmbeddings()

# Create vector store
vectorstore = PGVecto_rs(
    embedding=embeddings,
    dimension=1536,  # Dimension of your embeddings
    db_url=db_url,
    collection_name="my_documents",
    new_table=True  # Set to True to create a new table
)

Adding Documents

You can add documents to the vector store in several ways:

# Adding documents directly
docs = [
    Document(page_content="Sample text 1", metadata={"source": "doc1"}),
    Document(page_content="Sample text 2", metadata={"source": "doc2"})
]
vectorstore.add_documents(docs)

# Adding texts with metadata
texts = ["Sample text 1", "Sample text 2"]
metadatas = [{"source": "text1"}, {"source": "text2"}]
vectorstore.add_texts(texts, metadatas=metadatas)

Performing Similarity Search

The vector store supports different types of similarity search:

# Basic similarity search
query = "Sample query text"
docs = vectorstore.similarity_search(
    query, 
    k=4,  # Number of results to return
    distance_func='sqrt_euclid'  # Distance function to use
)

# Search with metadata filter
from pgvecto_rs.sdk.filters import meta_contains

docs = vectorstore.similarity_search(
    query,
    k=4,
    filter=meta_contains({"source": "doc1"})
)

# Search with scores
results = vectorstore.similarity_search_with_score(
    query,
    k=4
)
for doc, score in results:
    print(f"Content: {doc.page_content}")
    print(f"Score: {score}")

Using Maximal Marginal Relevance Search

For more diverse search results, you can use MMR search:

docs = vectorstore.max_marginal_relevance_search(
    query,
    k=4,  # Number of documents to return
    fetch_k=20,  # Number of documents to fetch before reranking
    lambda_mult=0.5  # Diversity factor (0 to 1)
)

Async Operations

PGVecto_rs also supports async operations for better performance:

import asyncio

async def search_documents():
    docs = await vectorstore.asimilarity_search(
        "Sample query",
        k=4
    )
    return docs

# Run async operation
docs = asyncio.run(search_documents())

The PGVecto_rs vector store provides a powerful way to store and search document embeddings using PostgreSQL. It's particularly useful when you need a reliable, SQL-based vector store with support for metadata filtering and different similarity metrics.

By supporting both sync and async operations, it can be effectively used in both traditional and high-performance async applications. The ability to choose different distance functions also makes it flexible for various use cases where different similarity metrics might be more appropriate.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs