Using Jina Embeddings in LangChain for Text and Image Embeddings

Posted: Feb 21, 2025.

Jina AI embeddings provide a powerful way to convert text and images into vector representations that can be used for semantic search and other machine learning tasks. This guide shows you how to use JinaEmbeddings in LangChain.

What is JinaEmbeddings?

JinaEmbeddings is a LangChain integration that allows you to generate embeddings using Jina AI's embedding models. It supports both text and image embeddings, making it versatile for multimodal applications. The default model is 'jina-embeddings-v2-base-en', which is optimized for English text.

Reference

Method	Description
`embed_documents(texts)`	Converts a list of texts into their vector embeddings
`embed_query(text)`	Converts a single text into its vector embedding
`embed_images(uris)`	Converts a list of image URIs into their vector embeddings
`aembed_documents(texts)`	Async version of embed_documents
`aembed_query(text)`	Async version of embed_query

How to Use JinaEmbeddings

Initial Setup

First, you'll need to get a Jina AI API token and install the necessary package:

import os
from langchain_community.embeddings import JinaEmbeddings

# Set your API key
os.environ["JINA_API_TOKEN"] = "your-api-key"

# Initialize the embeddings model
embeddings = JinaEmbeddings(
    jina_api_key="your-api-key",  # Optional if set in env
    model_name="jina-embeddings-v2-base-en"  # Default model
)

Generating Text Embeddings

You can generate embeddings for single or multiple texts:

# Single text embedding
query = "What is artificial intelligence?"
query_embedding = embeddings.embed_query(query)
print(f"Query embedding dimension: {len(query_embedding)}")

# Multiple document embeddings
documents = [
    "AI is a branch of computer science",
    "Machine learning is a subset of AI",
    "Deep learning uses neural networks"
]
doc_embeddings = embeddings.embed_documents(documents)
print(f"Number of document embeddings: {len(doc_embeddings)}")

Working with Image Embeddings

JinaEmbeddings also supports image embeddings using URLs:

# Generate embeddings for images
image_urls = [
    "https://example.com/image1.jpg",
    "https://example.com/image2.jpg"
]
image_embeddings = embeddings.embed_images(image_urls)
print(f"Number of image embeddings: {len(image_embeddings)}")

Async Operations

For better performance in async applications, you can use the async methods:

import asyncio

async def process_embeddings():
    # Async query embedding
    query = "What is artificial intelligence?"
    query_embedding = await embeddings.aembed_query(query)
    
    # Async document embeddings
    documents = [
        "AI is a branch of computer science",
        "Machine learning is a subset of AI"
    ]
    doc_embeddings = await embeddings.aembed_documents(documents)
    return query_embedding, doc_embeddings

# Run async function
query_emb, doc_embs = asyncio.run(process_embeddings())

Integration with Vector Stores

JinaEmbeddings can be easily integrated with vector stores for similarity search:

from langchain_community.vectorstores import FAISS

# Create some example texts
texts = [
    "The sky is blue",
    "The sun is bright",
    "The grass is green"
]

# Create a vector store
vectorstore = FAISS.from_texts(
    texts,
    embeddings
)

# Perform a similarity search
query = "bright day"
docs = vectorstore.similarity_search(query, k=2)

This setup allows you to create powerful semantic search applications using Jina AI's state-of-the-art embedding models. The embeddings can be used for various downstream tasks like document retrieval, clustering, or as input features for machine learning models.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs