Using NLPCloud Embeddings with LangChain for Multilingual Text Processing

Posted: Feb 18, 2025.

NLPCloud Embeddings is a powerful integration in LangChain that allows you to generate vector representations of text using NLPCloud's advanced embedding models. It's particularly useful for multilingual applications since it supports text processing in more than 50 languages.

What is NLPCloudEmbeddings?

NLPCloudEmbeddings is a LangChain wrapper for NLP Cloud's embedding service. It provides access to models like paraphrase-multilingual-mpnet-base-v2, which is based on Sentence Transformers and optimized for extracting embeddings across multiple languages. These embeddings can be used for various NLP tasks like semantic search, text similarity, and document clustering.

Reference

MethodDescription
embed_documents(texts: List[str])Generates embeddings for a list of text documents
embed_query(text: str)Generates an embedding for a single query text
aembed_documents(texts: List[str])Asynchronous version of embed_documents
aembed_query(text: str)Asynchronous version of embed_query

Additional parameters:

  • gpu: Boolean to specify if GPU should be used
  • model_name: Name of the embedding model to use

How to use NLPCloudEmbeddings

Initial Setup

First, you'll need to install the required package and set up your API key:

# Install the package
pip install nlpcloud

# Import the embeddings class
from langchain_community.embeddings import NLPCloudEmbeddings

# Set your API key
import os
os.environ["NLPCLOUD_API_KEY"] = "your-api-key"

Basic Usage

Here's how to create embeddings for both individual queries and documents:

# Initialize the embeddings model
embeddings = NLPCloudEmbeddings()

# Generate embeddings for a single query
query_text = "What is artificial intelligence?"
query_embedding = embeddings.embed_query(query_text)

# Generate embeddings for multiple documents
documents = [
    "AI is a broad field of computer science.",
    "Machine learning is a subset of AI.",
    "Deep learning uses neural networks."
]
document_embeddings = embeddings.embed_documents(documents)

Async Operations

For better performance in async applications, you can use the async methods:

import asyncio

async def process_embeddings():
    embeddings = NLPCloudEmbeddings()
    
    # Generate embeddings asynchronously
    query_embedding = await embeddings.aembed_query("What is AI?")
    
    # Process multiple documents asynchronously
    docs = ["First document", "Second document", "Third document"]
    doc_embeddings = await embeddings.aembed_documents(docs)
    
    return query_embedding, doc_embeddings

# Run the async function
query_emb, doc_embs = asyncio.run(process_embeddings())

Custom Model Configuration

You can customize the embedding model configuration:

# Initialize with specific model and GPU settings
embeddings = NLPCloudEmbeddings(
    model_name="paraphrase-multilingual-mpnet-base-v2",
    gpu=True  # Enable GPU acceleration
)

Error Handling

The class includes built-in environment validation. Here's how to handle potential errors:

from pydantic import ValidationError

try:
    embeddings = NLPCloudEmbeddings()
    result = embeddings.embed_query("Test query")
except ValidationError:
    print("Invalid configuration or missing API key")
except Exception as e:
    print(f"An error occurred: {str(e)}")

NLPCloudEmbeddings is particularly useful when you need to process text in multiple languages or require high-quality embeddings for downstream tasks. The async support makes it suitable for high-performance applications where parallel processing is important.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs