Using the Xata Vector Store in LangChain

Posted: Feb 6, 2025.

The Xata Vector Store integration in LangChain allows you to use Xata's serverless database platform as a vector store for semantic search and document retrieval. This guide will show you how to set up and use XataVectorStore effectively.

What is XataVectorStore?

XataVectorStore is a LangChain vector store implementation that uses Xata as the backend storage. Xata is a serverless data platform based on PostgreSQL that provides native vector storage and similarity search capabilities. This integration allows you to:

  • Store documents and their embeddings in Xata tables
  • Perform similarity searches on your document collection
  • Add metadata to your documents
  • Use any embedding model supported by LangChain

Reference

Here are the key methods available in XataVectorStore:

MethodDescription
__init__()Initialize the vector store with API key, database URL, embedding model and table name
from_documents()Create a vector store instance from a list of documents
from_texts()Create a vector store instance from a list of texts
add_documents()Add new documents to the vector store
add_texts()Add new texts with optional metadata to the vector store
similarity_search()Find similar documents based on a query string
similarity_search_with_score()Find similar documents and return similarity scores
delete()Delete documents from the store by ID

How to Use XataVectorStore

Setting Up

First, you'll need to create a Xata database and table with the right schema. The table should have:

from langchain_community.vectorstores.xata import XataVectorStore
from langchain_openai import OpenAIEmbeddings

# Initialize the vector store
embeddings = OpenAIEmbeddings()
vector_store = XataVectorStore(
    api_key="your-xata-api-key",
    db_url="your-database-url", 
    embedding=embeddings,
    table_name="vectors"
)

Adding Documents

You can add documents to the vector store in several ways:

# From a list of documents
docs = [Document(page_content="Content 1"), Document(page_content="Content 2")]
vector_store = XataVectorStore.from_documents(
    docs,
    embeddings,
    api_key="your-api-key",
    db_url="your-db-url",
    table_name="vectors"
)

# Adding texts with metadata
texts = ["Text 1", "Text 2"]
metadatas = [{"source": "doc1"}, {"source": "doc2"}]
vector_store.add_texts(texts, metadatas=metadatas)

Performing Searches

XataVectorStore provides several search methods:

# Basic similarity search
results = vector_store.similarity_search(
    "What is machine learning?",
    k=4  # Number of results to return
)

# Search with scores
results = vector_store.similarity_search_with_score(
    "What is machine learning?",
    k=4
)
for doc, score in results:
    print(f"Content: {doc.page_content}")
    print(f"Score: {score}")
    print("---")

# Search with filters
results = vector_store.similarity_search(
    "What is machine learning?",
    filter={"source": "textbook"}
)

Managing Documents

You can also manage the documents in your vector store:

# Delete specific documents
vector_store.delete(ids=["doc1", "doc2"])

# Delete all documents
vector_store.delete(delete_all=True)

# Get documents by ID
docs = vector_store.get_by_ids(["doc1", "doc2"])

Async Support

XataVectorStore includes async versions of most methods for better performance in async applications:

# Async similarity search
docs = await vector_store.asimilarity_search("What is machine learning?")

# Async document addition
await vector_store.aadd_documents(documents)

Remember to properly configure your Xata database schema with the required columns:

  • content (Text type) - For storing document content
  • embedding (Vector type) - For storing embeddings
  • Additional columns for any metadata you want to store

XataVectorStore provides a powerful way to implement semantic search in your applications while leveraging Xata's serverless infrastructure and PostgreSQL capabilities.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs