Using LangChain Infinity Local Embeddings for Text Embeddings
Posted: Feb 15, 2025.
The InfinityEmbeddingsLocal class in LangChain provides a way to generate text embeddings using local models with optimized performance. In this guide, we'll explore how to use this class effectively for your embedding needs.
What is InfinityEmbeddingsLocal?
InfinityEmbeddingsLocal is a class that allows you to run embedding models locally using the michaelfeil/infinity project. It's designed to be used asynchronously and supports various embedding models from Hugging Face, with optimized inference capabilities.
The class provides efficient batching and device-specific optimizations, making it a good choice when you need to generate embeddings without relying on external APIs.
Reference
Here are the key parameters and methods of InfinityEmbeddingsLocal:
Parameter | Description |
---|---|
model | Required. The Hugging Face model ID (e.g., "BAAI/bge-small-en-v1.5") |
device | Device for inference ('cpu', 'cuda', 'mps'). Defaults to 'auto' |
batch_size | Batch size for inference. Defaults to 32 |
backend | Inference backend ('torch' recommended for ROCm/Nvidia). Defaults to 'torch' |
model_warmup | Whether to warmup model with max batch size. Defaults to True |
revision | Model version (commit hash from Hugging Face). Defaults to None |
Key Methods:
aembed_documents(texts)
: Async method to embed multiple textsaembed_query(text)
: Async method to embed a single query textembed_documents(texts)
: Sync method (not recommended - async only)embed_query(text)
: Sync method (not recommended - async only)
How to Use InfinityEmbeddingsLocal
Installation
First, install the required dependencies:
Basic Usage
Here's a basic example of using InfinityEmbeddingsLocal to embed documents:
Computing Similarities
Once you have your embeddings, you can compute similarities between them:
Advanced Configuration
You can customize the embedder for specific needs:
Remember that InfinityEmbeddingsLocal is designed for async usage, so you should always use it within an async context manager (async with
) and call the async methods (aembed_documents
, aembed_query
) rather than their sync counterparts.
The class is particularly useful when you need to:
- Generate embeddings locally without API dependencies
- Process large batches of text efficiently
- Have control over the specific embedding model and inference settings
- Need optimized performance on specific hardware (CPU/GPU)
An alternative to LangSmith
Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.
LangChain DocsJoin 10,000+ subscribers
Every 2 weeks, latest model releases and industry news.
An alternative to LangSmith
Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.