ECloud Vector Search in LangChain - Using China Mobile's ElasticSearch Service

Posted: Feb 9, 2025.

The ECloud Vector Search integration in LangChain provides a way to store and search document embeddings using China Mobile's ECloud ElasticSearch service. This guide will show you how to use the EcloudESVectorStore class for vector similarity search and document retrieval.

What is EcloudESVectorStore?

EcloudESVectorStore is a vector store implementation that uses China Mobile's ECloud ElasticSearch service as the backend. It allows you to:

  • Store document embeddings in ElasticSearch indices
  • Perform similarity searches using different models and methods
  • Filter search results
  • Support for asynchronous operations
  • Handle document metadata and embeddings

Reference

Key methods of the EcloudESVectorStore class:

MethodDescription
from_documents()Create a vector store from a list of documents
from_texts()Create a vector store from raw texts
similarity_search()Find similar documents to a query text
similarity_search_with_score()Find similar documents with similarity scores
add_documents()Add new documents to the store
delete()Delete documents from the store
max_marginal_relevance_search()Search with diversity optimization

How to use EcloudESVectorStore

Basic Setup and Initialization

First, let's set up a connection to ECloud ElasticSearch:

from langchain_community.vectorstores import EcloudESVectorStore
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

vectorstore = EcloudESVectorStore(
    embedding=embeddings,
    index_name="documents",
    es_url="http://localhost:9200",
    user="username",  # Optional
    password="password"  # Optional
)

Adding Documents

You can add documents in a few different ways:

# From raw texts
texts = [
    "First document text",
    "Second document text",
    "Third document text"
]
vectorstore = EcloudESVectorStore.from_texts(
    texts,
    embeddings,
    index_name="documents",
    es_url="http://localhost:9200"
)

# From Document objects
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("documents.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

vectorstore = EcloudESVectorStore.from_documents(
    docs,
    embeddings,
    index_name="documents",
    es_url="http://localhost:9200"
)

Performing Similarity Searches

Basic similarity search:

# Find similar documents
results = vectorstore.similarity_search(
    "What is machine learning?",
    k=4  # Number of results to return
)

# Search with scores
results = vectorstore.similarity_search_with_score(
    "What is machine learning?",
    k=4
)

Using filters:

# Search with metadata filters
results = vectorstore.similarity_search(
    "What is machine learning?",
    filter={"term": {"category": "technology"}},
    search_params={
        "model": "exact",
        "vector_field": "my_vec",
        "text_field": "my_text",
    }
)

Advanced Search Options

Using different search models and similarities:

# Using LSH (Locality Sensitive Hashing) model
vectorstore = EcloudESVectorStore.from_documents(
    docs,
    embeddings,
    index_name="documents",
    es_url="http://localhost:9200",
    vector_type="knn_dense_float_vector",
    vector_params={
        "model": "lsh", 
        "similarity": "cosine", 
        "L": 99, 
        "k": 1
    }
)

# Search with specific parameters
results = vectorstore.similarity_search(
    "query text",
    k=10,
    search_params={
        "model": "exact",
        "similarity": "cosine",
        "vector_field": "my_vec",
        "text_field": "my_text",
    }
)

Async Operations

The class supports async operations for better performance in async applications:

# Async similarity search
results = await vectorstore.asimilarity_search(
    "What is machine learning?",
    k=4
)

# Async document addition
await vectorstore.aadd_documents(documents)

# Async deletion
await vectorstore.adelete(["doc_id1", "doc_id2"])

The EcloudESVectorStore provides a robust integration with China Mobile's ECloud ElasticSearch service, offering various search models and methods for different use cases. Whether you need exact search, LSH-based approximate search, or filtered queries, the implementation provides flexibility while maintaining ease of use.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs