Using StarRocks Vector Store with LangChain

Posted: Nov 16, 2024.

StarRocks is a high-performance analytical database that can be used as a vector store in LangChain. This guide will show you how to use the StarRocks vector store integration to store and search document embeddings.

What is StarRocks Vector Store?

StarRocks vector store is a LangChain integration that allows you to use StarRocks as a storage backend for document embeddings. It provides functionality for:

  • Storing document embeddings and metadata in StarRocks tables
  • Performing similarity search using cosine similarity
  • Supporting metadata filtering
  • Async operations for better performance

The main benefit of using StarRocks as a vector store is its excellent query performance thanks to its vectorized execution engine.

Reference

Here are the main methods available in the StarRocks vector store:

MethodDescription
add_texts()Add text documents to the vector store
add_documents()Add Document objects to the vector store
similarity_search()Search for similar documents using a text query
similarity_search_with_score()Search with similarity scores
similarity_search_by_vector()Search using embedding vectors directly
delete()Delete documents by ID
drop()Drop the vector store table

How to Use StarRocks Vector Store

Setting up the Connection

First, you need to configure the connection to your StarRocks instance:

from langchain_community.vectorstores import StarRocks
from langchain_community.vectorstores.starrocks import StarRocksSettings
from langchain_openai import OpenAIEmbeddings

# Configure StarRocks connection settings
settings = StarRocksSettings(
    host="127.0.0.1",
    port=41003,
    username="root", 
    password="",
    database="default",
    table="langchain" # Optional, defaults to 'langchain'
)

# Initialize the vector store
embeddings = OpenAIEmbeddings()
vectorstore = StarRocks(embeddings, settings)

Adding Documents

You can add documents in two ways:

# Add texts directly
texts = [
    "StarRocks is a database",
    "It has great query performance",
    "You can use it as a vector store"
]
vectorstore.add_texts(texts)

# Add Document objects
from langchain_core.documents import Document
docs = [
    Document(page_content="Some content", metadata={"source": "doc1"}),
    Document(page_content="Other content", metadata={"source": "doc2"})
]
vectorstore.add_documents(docs)

Searching Documents

Perform similarity search queries:

# Basic similarity search
docs = vectorstore.similarity_search(
    "what is starrocks?",
    k=4  # Number of results to return
)

# Search with filter condition
docs = vectorstore.similarity_search(
    "what is starrocks?",
    where_str="metadata.source = 'doc1'"  # Filter on metadata
)

# Search with scores
docs_and_scores = vectorstore.similarity_search_with_score(
    "what is starrocks?"
)

Async Operations

StarRocks vector store also supports async operations for better performance:

# Async similarity search
docs = await vectorstore.asimilarity_search(
    "what is starrocks?",
    k=4
)

# Async document addition
await vectorstore.aadd_texts(["some text"])

Using as a Retriever

The vector store can be used as a retriever in LangChain chains:

retriever = vectorstore.as_retriever()

# Configure retriever settings
retriever = vectorstore.as_retriever(
    search_type="mmr",  # Use MMR search algorithm
    search_kwargs={
        "k": 6,
        "lambda_mult": 0.25
    }
)

StarRocks vector store provides a robust solution for storing and searching document embeddings, especially when you need high query performance. The integration with LangChain makes it easy to use StarRocks in your LLM applications.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs