Using LangChain Infinity Local Embeddings for Text Embeddings

Posted: Feb 15, 2025.

The InfinityEmbeddingsLocal class in LangChain provides a way to generate text embeddings using local models with optimized performance. In this guide, we'll explore how to use this class effectively for your embedding needs.

What is InfinityEmbeddingsLocal?

InfinityEmbeddingsLocal is a class that allows you to run embedding models locally using the michaelfeil/infinity project. It's designed to be used asynchronously and supports various embedding models from Hugging Face, with optimized inference capabilities.

The class provides efficient batching and device-specific optimizations, making it a good choice when you need to generate embeddings without relying on external APIs.

Reference

Here are the key parameters and methods of InfinityEmbeddingsLocal:

Parameter	Description
model	Required. The Hugging Face model ID (e.g., "BAAI/bge-small-en-v1.5")
device	Device for inference ('cpu', 'cuda', 'mps'). Defaults to 'auto'
batch_size	Batch size for inference. Defaults to 32
backend	Inference backend ('torch' recommended for ROCm/Nvidia). Defaults to 'torch'
model_warmup	Whether to warmup model with max batch size. Defaults to True
revision	Model version (commit hash from Hugging Face). Defaults to None

Key Methods:

aembed_documents(texts): Async method to embed multiple texts
aembed_query(text): Async method to embed a single query text
embed_documents(texts): Sync method (not recommended - async only)
embed_query(text): Sync method (not recommended - async only)

How to Use InfinityEmbeddingsLocal

Installation

First, install the required dependencies:

pip install infinity_emb[torch,optimum]

Basic Usage

Here's a basic example of using InfinityEmbeddingsLocal to embed documents:

from langchain_community.embeddings import InfinityEmbeddingsLocal

# Initialize the embedder
embeddings = InfinityEmbeddingsLocal(
    model="sentence-transformers/all-MiniLM-L6-v2",
    device="cuda",  # Use GPU if available
    batch_size=32
)

# Example texts
documents = [
    "Artificial intelligence is transforming industries",
    "Machine learning models require quality data",
    "Neural networks can learn complex patterns"
]
query = "What is AI?"

# Create async function to use the embedder
async def embed_texts():
    async with embeddings:
        # Embed documents
        doc_embeddings = await embeddings.aembed_documents(documents)
        # Embed query
        query_embedding = await embeddings.aembed_query(query)
        return doc_embeddings, query_embedding

# Run the async function (in your async environment)
doc_embeddings, query_embedding = await embed_texts()

Computing Similarities

Once you have your embeddings, you can compute similarities between them:

import numpy as np

# Convert embeddings to numpy arrays
doc_embeddings_array = np.array(doc_embeddings)
query_embedding_array = np.array(query_embedding)

# Compute similarity scores
similarity_scores = doc_embeddings_array @ query_embedding_array.T

# Create a dictionary of documents and their similarity scores
results = dict(zip(documents, similarity_scores))

Advanced Configuration

You can customize the embedder for specific needs:

embeddings = InfinityEmbeddingsLocal(
    model="BAAI/bge-small-en-v1.5",
    device="cuda",  # Use GPU
    batch_size=64,  # Increase batch size
    model_warmup=True,  # Enable model warmup
    backend="torch"  # Use torch backend
)

Remember that InfinityEmbeddingsLocal is designed for async usage, so you should always use it within an async context manager (async with) and call the async methods (aembed_documents, aembed_query) rather than their sync counterparts.

The class is particularly useful when you need to:

Generate embeddings locally without API dependencies
Process large batches of text efficiently
Have control over the specific embedding model and inference settings
Need optimized performance on specific hardware (CPU/GPU)

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs