LangChain Citation Fuzzy Matching - Adding Citations to LLM Responses

Posted: Nov 8, 2024.

When building LLM applications, it's often important to ground the model's responses in source material and provide citations. LangChain's citation fuzzy matching functionality helps you automatically identify and cite relevant passages from your context that support the LLM's responses.

What is Citation Fuzzy Matching?

Citation fuzzy matching is a feature that allows you to create chains that not only answer questions based on provided context but also automatically identify and include citations to relevant portions of the source material. It uses fuzzy matching to find the most relevant segments of your context that support the model's responses.

Reference

Method	Description
create_citation_fuzzy_match_runnable(llm)	Creates a Runnable chain that processes questions with context and returns answers with citations. Takes a BaseChatModel as input.

How to Use Citation Fuzzy Matching

Basic Usage

Here's a simple example of how to use citation fuzzy matching:

from langchain.chains import create_citation_fuzzy_match_runnable
from langchain_openai import ChatOpenAI

# Initialize the language model
llm = ChatOpenAI(model="gpt-4")

# Create the citation chain
chain = create_citation_fuzzy_match_runnable(llm)

# Define your context and question
context = """
The Golden Gate Bridge was completed in 1937.
It connects San Francisco to Marin County.
The bridge is painted in International Orange color.
"""

question = "When was the Golden Gate Bridge built?"

# Run the chain
response = chain.invoke({
    "question": question,
    "context": context
})

print(response)

Working with Longer Context

The citation fuzzy matcher works particularly well with longer contexts where you need to trace the source of information:

from langchain.chains import create_citation_fuzzy_match_runnable
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4")
chain = create_citation_fuzzy_match_runnable(llm)

# Multiple paragraphs of context
context = """
The human brain is one of the most complex organs in the body.
It contains approximately 86 billion neurons and weighs about 3 pounds.

The heart pumps about 2,000 gallons of blood per day.
It beats approximately 100,000 times daily.

The human body has 206 bones in total.
The smallest bone is located in the middle ear.
"""

# Ask multiple questions
questions = [
    "How many neurons are in the brain?",
    "How much blood does the heart pump?"
]

for question in questions:
    response = chain.invoke({
        "question": question,
        "context": context
    })
    print(f"Question: {question}")
    print(f"Response: {response}\n")

Best Practices

When using the citation fuzzy matching functionality:

Provide clear, well-structured context that contains factual information
Ask specific questions that can be answered using the provided context
Make sure your context contains the information needed to answer the questions
Use a capable language model that supports function calling (like GPT-4)

The chain will automatically:

Analyze the context to find relevant passages
Generate an answer based on the context
Include citations to support the response
Match similar phrases even if they're not exactly identical

This makes it particularly useful for applications like:

Question-answering systems
Document analysis tools
Educational platforms
Research assistants
Fact-checking systems

Remember that the quality of citations depends on both the quality of your context and the capabilities of the underlying language model.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs