Vector Storage with LangChain VDMS

Posted: Nov 21, 2024.

VDMS (Visual Data Management System) is a powerful storage solution from Intel Labs designed to efficiently handle and search visual data and vector embeddings at scale. This guide will show you how to use VDMS as a vector store in LangChain.

What is VDMS?

VDMS is a storage system optimized for working with embeddings and visual data. It provides:

Fast k-nearest neighbor (KNN) similarity search
Support for multiple distance metrics (L2 and inner product)
Various indexing engines (Faiss, TileDB, Flinng)
Ability to store and search text, image and video embeddings
Rich metadata filtering capabilities

The system has a client-server architecture, with a Python client that connects to a VDMS server.

Reference

Key methods for working with VDMS in LangChain:

Method	Description
`from_documents()`	Create a new VDMS vectorstore from documents
`from_texts()`	Create a new VDMS vectorstore from raw texts
`similarity_search()`	Find similar documents using KNN search
`similarity_search_with_score()`	Find similar documents and return scores
`add_documents()`	Add new documents to existing collection
`delete()`	Delete documents by ID or metadata
`update_document()`	Update an existing document

How to use VDMS

Setting up VDMS

First, you'll need to run a VDMS server. The easiest way is using Docker:

docker run -d -p 55555:55555 intellabs/vdms:latest

Then install the Python client:

pip install vdms

Basic Usage

Here's how to create and query a VDMS vectorstore:

from langchain_community.vectorstores import VDMS
from langchain_community.vectorstores.vdms import VDMS_Client
from langchain_huggingface import HuggingFaceEmbeddings

# Connect to VDMS server
client = VDMS_Client("localhost", 55555)

# Create embedding function
model_name = "sentence-transformers/all-mpnet-base-v2"
embeddings = HuggingFaceEmbeddings(model_name=model_name)

# Create vectorstore from documents
vectorstore = VDMS.from_documents(
    documents,
    client=client,
    embedding=embeddings,
    collection_name="my_collection",
    engine="FaissFlat",  # Use Faiss indexing
    distance_strategy="L2" # Use Euclidean distance
)

# Perform similarity search
query = "What is machine learning?"
docs = vectorstore.similarity_search(query, k=3)

Advanced Features

Different Indexing Engines

VDMS supports multiple indexing engines:

# Using Faiss IVF Flat indexing
vectorstore = VDMS.from_documents(
    documents,
    client=client,
    embedding=embeddings,
    engine="FaissIVFFlat"
)

# Using FLINNG indexing
vectorstore = VDMS.from_documents(
    documents,
    client=client, 
    embedding=embeddings,
    engine="Flinng"
)

# Using TileDB Dense
vectorstore = VDMS.from_documents(
    documents,
    client=client,
    embedding=embeddings, 
    engine="TileDBDense"
)

Metadata Filtering

You can filter search results based on metadata:

# Search with metadata filter
filter = {
    "date": [">", "2023-01-01"],
    "category": ["==", "science"]
}

docs = vectorstore.similarity_search(
    query,
    k=3,
    filter=filter
)

Document Management

Update and delete operations:

# Update a document
vectorstore.update_document(
    collection_name="my_collection",
    document_id="123",
    document=new_document
)

# Delete documents
vectorstore.delete(
    collection_name="my_collection",
    ids=["123", "456"]  # Delete specific documents
)

# Delete entire collection
vectorstore.delete(collection_name="my_collection")

MMR Search

Use Maximal Marginal Relevance for diverse results:

docs = vectorstore.max_marginal_relevance_search(
    query,
    k=3,  # Number of documents to return
    fetch_k=10,  # Number of documents to fetch before reranking
    lambda_mult=0.5  # Diversity factor (0=max diversity, 1=min diversity)
)

Using as a Retriever

VDMS can be used as a retriever in LangChain chains:

# Create retriever with similarity search
retriever = vectorstore.as_retriever()

# Create retriever with MMR search
retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={
        'k': 3,
        'fetch_k': 10,
        'lambda_mult': 0.5
    }
)

VDMS provides a robust and scalable solution for vector storage and similarity search in LangChain applications. Its support for multiple indexing engines and distance metrics allows you to optimize for your specific use case, while rich metadata filtering enables complex search scenarios.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs