Using PGVecto_rs Vector Store in LangChain

Posted: Jan 30, 2025.

The PGVecto_rs vector store allows you to store and search document embeddings using PostgreSQL with the pgvecto.rs extension. In this guide, we'll explore how to use this vector store in LangChain for document storage and similarity search.

What is PGVecto_rs?

PGVecto_rs is a PostgreSQL-based vector store that leverages the pgvecto.rs extension to enable vector similarity search capabilities. It allows you to:

  • Store document embeddings in PostgreSQL tables
  • Perform similarity searches using different distance metrics
  • Filter search results based on metadata
  • Support both synchronous and asynchronous operations

Reference

Here are the key methods provided by the PGVecto_rs class:

MethodDescription
from_documents()Creates a new vector store from a list of documents
from_texts()Creates a new vector store from a list of texts
from_collection_name()Connects to an existing vector store collection
similarity_search()Performs similarity search with optional filtering
similarity_search_with_score()Returns documents with similarity scores
add_documents()Adds new documents to the store
add_texts()Adds new texts to the store
delete()Deletes documents by ID

How to Use PGVecto_rs

Setting up the Vector Store

First, you need to set up your PostgreSQL connection and initialize the vector store:

from langchain_community.vectorstores.pgvecto_rs import PGVecto_rs
from langchain_community.embeddings import OpenAIEmbeddings

# Configure database connection
db_url = "postgresql+psycopg://user:password@localhost:5432/dbname"

# Initialize embeddings
embeddings = OpenAIEmbeddings()

# Create vector store
vectorstore = PGVecto_rs(
    embedding=embeddings,
    dimension=1536,  # Dimension of your embeddings
    db_url=db_url,
    collection_name="my_documents",
    new_table=True  # Set to True to create a new table
)

Adding Documents

You can add documents to the vector store in several ways:

# Adding documents directly
docs = [
    Document(page_content="Sample text 1", metadata={"source": "doc1"}),
    Document(page_content="Sample text 2", metadata={"source": "doc2"})
]
vectorstore.add_documents(docs)

# Adding texts with metadata
texts = ["Sample text 1", "Sample text 2"]
metadatas = [{"source": "text1"}, {"source": "text2"}]
vectorstore.add_texts(texts, metadatas=metadatas)

The vector store supports different types of similarity search:

# Basic similarity search
query = "Sample query text"
docs = vectorstore.similarity_search(
    query, 
    k=4,  # Number of results to return
    distance_func='sqrt_euclid'  # Distance function to use
)

# Search with metadata filter
from pgvecto_rs.sdk.filters import meta_contains

docs = vectorstore.similarity_search(
    query,
    k=4,
    filter=meta_contains({"source": "doc1"})
)

# Search with scores
results = vectorstore.similarity_search_with_score(
    query,
    k=4
)
for doc, score in results:
    print(f"Content: {doc.page_content}")
    print(f"Score: {score}")

For more diverse search results, you can use MMR search:

docs = vectorstore.max_marginal_relevance_search(
    query,
    k=4,  # Number of documents to return
    fetch_k=20,  # Number of documents to fetch before reranking
    lambda_mult=0.5  # Diversity factor (0 to 1)
)

Async Operations

PGVecto_rs also supports async operations for better performance:

import asyncio

async def search_documents():
    docs = await vectorstore.asimilarity_search(
        "Sample query",
        k=4
    )
    return docs

# Run async operation
docs = asyncio.run(search_documents())

The PGVecto_rs vector store provides a powerful way to store and search document embeddings using PostgreSQL. It's particularly useful when you need a reliable, SQL-based vector store with support for metadata filtering and different similarity metrics.

By supporting both sync and async operations, it can be effectively used in both traditional and high-performance async applications. The ability to choose different distance functions also makes it flexible for various use cases where different similarity metrics might be more appropriate.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs