Using the Xata Vector Store in LangChain

Posted: Feb 6, 2025.

The Xata Vector Store integration in LangChain allows you to use Xata's serverless database platform as a vector store for semantic search and document retrieval. This guide will show you how to set up and use XataVectorStore effectively.

What is XataVectorStore?

XataVectorStore is a LangChain vector store implementation that uses Xata as the backend storage. Xata is a serverless data platform based on PostgreSQL that provides native vector storage and similarity search capabilities. This integration allows you to:

Store documents and their embeddings in Xata tables
Perform similarity searches on your document collection
Add metadata to your documents
Use any embedding model supported by LangChain

Reference

Here are the key methods available in XataVectorStore:

Method	Description
`__init__()`	Initialize the vector store with API key, database URL, embedding model and table name
`from_documents()`	Create a vector store instance from a list of documents
`from_texts()`	Create a vector store instance from a list of texts
`add_documents()`	Add new documents to the vector store
`add_texts()`	Add new texts with optional metadata to the vector store
`similarity_search()`	Find similar documents based on a query string
`similarity_search_with_score()`	Find similar documents and return similarity scores
`delete()`	Delete documents from the store by ID

How to Use XataVectorStore

Setting Up

First, you'll need to create a Xata database and table with the right schema. The table should have:

from langchain_community.vectorstores.xata import XataVectorStore
from langchain_openai import OpenAIEmbeddings

# Initialize the vector store
embeddings = OpenAIEmbeddings()
vector_store = XataVectorStore(
    api_key="your-xata-api-key",
    db_url="your-database-url", 
    embedding=embeddings,
    table_name="vectors"
)

Adding Documents

You can add documents to the vector store in several ways:

# From a list of documents
docs = [Document(page_content="Content 1"), Document(page_content="Content 2")]
vector_store = XataVectorStore.from_documents(
    docs,
    embeddings,
    api_key="your-api-key",
    db_url="your-db-url",
    table_name="vectors"
)

# Adding texts with metadata
texts = ["Text 1", "Text 2"]
metadatas = [{"source": "doc1"}, {"source": "doc2"}]
vector_store.add_texts(texts, metadatas=metadatas)

Performing Searches

XataVectorStore provides several search methods:

# Basic similarity search
results = vector_store.similarity_search(
    "What is machine learning?",
    k=4  # Number of results to return
)

# Search with scores
results = vector_store.similarity_search_with_score(
    "What is machine learning?",
    k=4
)
for doc, score in results:
    print(f"Content: {doc.page_content}")
    print(f"Score: {score}")
    print("---")

# Search with filters
results = vector_store.similarity_search(
    "What is machine learning?",
    filter={"source": "textbook"}
)

Managing Documents

You can also manage the documents in your vector store:

# Delete specific documents
vector_store.delete(ids=["doc1", "doc2"])

# Delete all documents
vector_store.delete(delete_all=True)

# Get documents by ID
docs = vector_store.get_by_ids(["doc1", "doc2"])

Async Support

XataVectorStore includes async versions of most methods for better performance in async applications:

# Async similarity search
docs = await vector_store.asimilarity_search("What is machine learning?")

# Async document addition
await vector_store.aadd_documents(documents)

Remember to properly configure your Xata database schema with the required columns:

content (Text type) - For storing document content
embedding (Vector type) - For storing embeddings
Additional columns for any metadata you want to store

XataVectorStore provides a powerful way to implement semantic search in your applications while leveraging Xata's serverless infrastructure and PostgreSQL capabilities.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs