LangChain Cosine Similarity Guide - Calculate Vector Similarities

Posted: Nov 10, 2024.

When working with vector embeddings and semantic similarity in LangChain applications, the cosine similarity calculation is an essential tool. This guide will show you how to effectively use the cosine_similarity utility function from LangChain.

What is cosine_similarity?

The cosine_similarity function calculates the row-wise cosine similarity between two equal-width matrices. Cosine similarity is a measure that calculates the cosine of the angle between two vectors, providing a similarity score between -1 and 1, where:

  • 1 means the vectors are identical
  • 0 means they are orthogonal (perpendicular)
  • -1 means they are opposite

This is particularly useful when working with text embeddings, semantic search, or routing between different components based on similarity.

Reference

ParameterTypeDescription
XList[List[float]] | List[ndarray] | ndarrayFirst matrix of vectors
YList[List[float]] | List[ndarray] | ndarraySecond matrix of vectors to compare against
ReturnsndarrayMatrix of cosine similarity scores

How to use cosine_similarity

Let's look at some practical examples of using the cosine_similarity function in LangChain.

Basic Usage with Numeric Vectors

from langchain_community.utils.math import cosine_similarity
import numpy as np

# Create two simple vectors
vector1 = [[1.0, 0.0, 1.0]]
vector2 = [[1.0, 1.0, 0.0]]

# Calculate similarity
similarity = cosine_similarity(vector1, vector2)
print(similarity)  # Will output the similarity score between the vectors

Using with Text Embeddings

A common use case is calculating similarity between text embeddings:

from langchain_openai import OpenAIEmbeddings
from langchain_community.utils.math import cosine_similarity

embeddings = OpenAIEmbeddings()

# Generate embeddings for two texts
text1_embedding = embeddings.embed_query("What is artificial intelligence?")
text2_embedding = embeddings.embed_query("Tell me about AI")

# Calculate similarity between the embeddings
similarity = cosine_similarity(
    [text1_embedding],
    [text2_embedding]
)
print(f"Similarity score: {similarity[0][0]}")

Semantic Routing Example

Here's a practical example of using cosine similarity for routing queries to different templates based on their semantic similarity:

from langchain_community.utils.math import cosine_similarity
from langchain_openai import OpenAIEmbeddings

# Define different templates
physics_template = """You are a physics professor..."""
math_template = """You are a mathematician..."""

embeddings = OpenAIEmbeddings()

# Create embeddings for templates
template_embeddings = embeddings.embed_documents([
    physics_template, 
    math_template
])

def route_query(query):
    # Get query embedding
    query_embedding = embeddings.embed_query(query)
    
    # Calculate similarity with all templates
    similarities = cosine_similarity(
        [query_embedding], 
        template_embeddings
    )[0]
    
    # Get most similar template
    most_similar_idx = similarities.argmax()
    templates = [physics_template, math_template]
    
    return templates[most_similar_idx]

# Example usage
query = "What is quantum mechanics?"
best_template = route_query(query)

Batch Processing

You can also calculate similarities for multiple vectors at once:

from langchain_community.utils.math import cosine_similarity
import numpy as np

# Create multiple vectors
vectors1 = np.array([
    [1.0, 0.0, 1.0],
    [0.0, 1.0, 1.0],
    [1.0, 1.0, 0.0]
])

vectors2 = np.array([
    [1.0, 0.0, 0.0],
    [0.0, 1.0, 0.0],
    [0.0, 0.0, 1.0]
])

# Calculate similarities for all pairs
similarities = cosine_similarity(vectors1, vectors2)
print("Similarity matrix:")
print(similarities)

This utility is particularly useful when building semantic search applications, recommendation systems, or implementing routing logic based on content similarity in your LangChain applications.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs