LangChain Apache Doris Settings for Vector Storage

Posted: Nov 7, 2024.

Apache Doris is a powerful data warehouse that can be used as a vector store in LangChain applications. Let's explore how to configure it properly using the ApacheDorisSettings class.

What is ApacheDorisSettings?

ApacheDorisSettings is a configuration class that allows you to set up connection and table details for using Apache Doris as a vector store in LangChain. It handles everything from basic connection parameters like host and port to specific table configurations for storing embeddings and metadata.

Reference

Here are the key configuration parameters available in ApacheDorisSettings:

ParameterDescriptionDefault Value
apache_doris_hostHost URL for the Doris frontend'localhost'
apache_doris_portHTTP port for connection9030
usernameLogin username'root'
passwordLogin passwordNone
databaseDatabase name'default'
tableTable name for operations'langchain'
column_mapDictionary mapping column namesDefault identity map

How to Use ApacheDorisSettings

Basic Configuration

Here's a simple example of creating a basic ApacheDorisSettings instance:

from langchain_community.vectorstores.apache_doris import ApacheDorisSettings

settings = ApacheDorisSettings(
    host="doris.example.com",
    port=9030,
    username="doris_user",
    password="secret_password",
    database="vector_db"
)

Custom Column Mapping

You can customize how your data is mapped to Doris columns using the column_map parameter:

settings = ApacheDorisSettings(
    host="doris.example.com",
    port=9030,
    column_map={
        'id': 'doc_id',
        'embedding': 'vector_embedding',
        'document': 'doc_content',
        'metadata': 'doc_metadata'
    }
)

This mapping tells Doris to use custom column names instead of the default ones.

Using with Vector Store

Here's how to use ApacheDorisSettings with the Apache Doris vector store:

from langchain_community.vectorstores import ApacheDoris
from langchain_openai import OpenAIEmbeddings

# Create settings
settings = ApacheDorisSettings(
    host="doris.example.com",
    port=9030,
    database="langchain",
    table="embeddings"
)

# Initialize embeddings
embeddings = OpenAIEmbeddings()

# Create vector store
vectorstore = ApacheDoris(
    embeddings=embeddings,
    config=settings
)

# Or create from documents
docs = [...]  # Your documents
vectorstore = ApacheDoris.from_documents(
    documents=docs,
    embedding=embeddings,
    config=settings
)

Environment-based Configuration

You can also use environment variables to configure ApacheDorisSettings. The class will automatically look for these variables:

APACHE_DORIS_HOST=doris.example.com
APACHE_DORIS_PORT=9030
APACHE_DORIS_USERNAME=root
APACHE_DORIS_PASSWORD=secret
APACHE_DORIS_DATABASE=vectordb
APACHE_DORIS_TABLE=embeddings

Then in your code:

settings = ApacheDorisSettings()  # Will automatically use environment variables

Using ApacheDorisSettings properly ensures your vector store operations with Apache Doris are configured correctly and efficiently. Remember to always secure your credentials and consider using environment variables for sensitive information in production environments.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs