Using VikingDB Vector Store in LangChain

Posted: Nov 17, 2024.

VikingDB is a powerful vector database designed to store, index and manage large-scale embedding vectors generated by neural networks and machine learning models. In this guide, we'll explore how to use VikingDB as a vector store in LangChain.

What is VikingDB?

VikingDB is a specialized database focused on handling vector embeddings efficiently. It provides features for:

  • Storing and indexing high-dimensional vectors
  • Fast similarity search
  • Collection management for organizing different vector sets
  • Asynchronous operations support
  • Flexible search options including MMR (Maximum Marginal Relevance)

Reference

Here are the key methods available in the VikingDB class:

MethodDescription
from_texts()Creates a new collection and adds text documents with their embeddings
from_documents()Creates a new collection from Document objects
add_texts()Adds new text documents to an existing collection
similarity_search()Performs similarity search for a query string
similarity_search_with_score()Similar to similarity_search but also returns relevance scores
max_marginal_relevance_search()Performs MMR search to get diverse results
delete()Deletes documents by their IDs

How to Use VikingDB

Initial Setup

First, you'll need to install the required packages:

!pip install volcengine langchain-community

Connecting to VikingDB

To use VikingDB, you need to provide connection configuration:

from langchain_community.vectorstores.vikingdb import VikingDB, VikingDBConfig
from langchain_openai import OpenAIEmbeddings

config = VikingDBConfig(
    host="your-host",
    region="your-region",
    ak="your-access-key",
    sk="your-secret-key",
    scheme="http"
)

embeddings = OpenAIEmbeddings()

Creating a Collection and Adding Documents

You can create a new collection and add documents in one go:

from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader

# Load and split documents
loader = TextLoader("your_file.txt")
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
docs = text_splitter.split_documents(documents)

# Create VikingDB instance with documents
db = VikingDB.from_documents(
    docs,
    embeddings,
    connection_args=config,
    collection_name="my_collection",
    drop_old=True  # Set to True to overwrite existing collection
)

You can perform different types of searches:

# Basic similarity search
query = "What is machine learning?"
results = db.similarity_search(query)

# Search with scores
results_with_scores = db.similarity_search_with_score(query)

# MMR search for diverse results
diverse_results = db.max_marginal_relevance_search(
    query,
    k=4,  # Number of results to return
    fetch_k=20,  # Number of initial results to consider
    lambda_mult=0.5  # Diversity factor (0 to 1)
)

Using Multiple Collections

VikingDB allows you to organize your vectors into different collections:

# Create a new collection
collection1 = VikingDB.from_documents(
    docs,
    embeddings,
    connection_args=config,
    collection_name="collection_1"
)

# Create another collection
collection2 = VikingDB.from_documents(
    other_docs,
    embeddings,
    connection_args=config,
    collection_name="collection_2"
)

# Access existing collection
existing_collection = VikingDB(
    embeddings,
    connection_args=config,
    collection_name="collection_1"
)

Async Operations

VikingDB supports async operations for better performance:

# Async similarity search
docs = await db.asimilarity_search(query)

# Async document addition
await db.aadd_texts(["new text 1", "new text 2"])

# Async deletion
await db.adelete(["doc_id_1", "doc_id_2"])

The VikingDB vector store provides a robust solution for managing and searching vector embeddings in LangChain applications. Its collection-based organization and support for async operations make it suitable for both small and large-scale applications.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs