Using StarRocks Vector Store with LangChain

Posted: Nov 16, 2024.

StarRocks is a high-performance analytical database that can be used as a vector store in LangChain. This guide will show you how to use the StarRocks vector store integration to store and search document embeddings.

What is StarRocks Vector Store?

StarRocks vector store is a LangChain integration that allows you to use StarRocks as a storage backend for document embeddings. It provides functionality for:

Storing document embeddings and metadata in StarRocks tables
Performing similarity search using cosine similarity
Supporting metadata filtering
Async operations for better performance

The main benefit of using StarRocks as a vector store is its excellent query performance thanks to its vectorized execution engine.

Reference

Here are the main methods available in the StarRocks vector store:

Method	Description
`add_texts()`	Add text documents to the vector store
`add_documents()`	Add Document objects to the vector store
`similarity_search()`	Search for similar documents using a text query
`similarity_search_with_score()`	Search with similarity scores
`similarity_search_by_vector()`	Search using embedding vectors directly
`delete()`	Delete documents by ID
`drop()`	Drop the vector store table

How to Use StarRocks Vector Store

Setting up the Connection

First, you need to configure the connection to your StarRocks instance:

from langchain_community.vectorstores import StarRocks
from langchain_community.vectorstores.starrocks import StarRocksSettings
from langchain_openai import OpenAIEmbeddings

# Configure StarRocks connection settings
settings = StarRocksSettings(
    host="127.0.0.1",
    port=41003,
    username="root", 
    password="",
    database="default",
    table="langchain" # Optional, defaults to 'langchain'
)

# Initialize the vector store
embeddings = OpenAIEmbeddings()
vectorstore = StarRocks(embeddings, settings)

Adding Documents

You can add documents in two ways:

# Add texts directly
texts = [
    "StarRocks is a database",
    "It has great query performance",
    "You can use it as a vector store"
]
vectorstore.add_texts(texts)

# Add Document objects
from langchain_core.documents import Document
docs = [
    Document(page_content="Some content", metadata={"source": "doc1"}),
    Document(page_content="Other content", metadata={"source": "doc2"})
]
vectorstore.add_documents(docs)

Searching Documents

Perform similarity search queries:

# Basic similarity search
docs = vectorstore.similarity_search(
    "what is starrocks?",
    k=4  # Number of results to return
)

# Search with filter condition
docs = vectorstore.similarity_search(
    "what is starrocks?",
    where_str="metadata.source = 'doc1'"  # Filter on metadata
)

# Search with scores
docs_and_scores = vectorstore.similarity_search_with_score(
    "what is starrocks?"
)

Async Operations

StarRocks vector store also supports async operations for better performance:

# Async similarity search
docs = await vectorstore.asimilarity_search(
    "what is starrocks?",
    k=4
)

# Async document addition
await vectorstore.aadd_texts(["some text"])

Using as a Retriever

The vector store can be used as a retriever in LangChain chains:

retriever = vectorstore.as_retriever()

# Configure retriever settings
retriever = vectorstore.as_retriever(
    search_type="mmr",  # Use MMR search algorithm
    search_kwargs={
        "k": 6,
        "lambda_mult": 0.25
    }
)

StarRocks vector store provides a robust solution for storing and searching document embeddings, especially when you need high query performance. The integration with LangChain makes it easy to use StarRocks in your LLM applications.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs