Using merge_chat_runs in LangChain - Chat Message Processing

Posted: Nov 20, 2024.

When working with chat conversations in LangChain, you often need to process and organize messages efficiently. The merge_chat_runs utility helps combine consecutive messages from the same sender into a single message, making conversations more organized and easier to process.

What is merge_chat_runs?

merge_chat_runs is a utility function in LangChain that combines sequences of consecutive messages from the same sender (called a "chat run") into a single message. This is particularly useful when processing chat logs from various platforms like Discord, WhatsApp, Telegram, etc., where users might send multiple short messages in succession.

Reference

The merge_chat_runs function has a simple but powerful interface:

Parameter	Type	Description
chat_sessions	Iterable[ChatSession]	An iterable of chat sessions containing messages to be processed
Returns	Iterator[ChatSession]	Returns an iterator of chat sessions with merged messages

How to use merge_chat_runs

Let's look at practical examples of how to use merge_chat_runs in different scenarios.

Basic Usage with Chat Messages

Here's a simple example of merging consecutive messages from the same sender:

from langchain_community.chat_loaders.utils import merge_chat_runs
from langchain_core.chat_sessions import ChatSession
from langchain_core.messages import HumanMessage

# Create some sample messages
chat_session = ChatSession(
    messages=[
        HumanMessage(content="Hey there!", additional_kwargs={"sender": "Alice"}),
        HumanMessage(content="How are you?", additional_kwargs={"sender": "Alice"}),
        HumanMessage(content="I'm good!", additional_kwargs={"sender": "Bob"}),
        HumanMessage(content="What about you?", additional_kwargs={"sender": "Bob"})
    ]
)

# Merge consecutive messages
merged_sessions = list(merge_chat_runs([chat_session]))

# The result will combine Alice's and Bob's consecutive messages

Using with Chat Platform Data

merge_chat_runs is commonly used when processing chat exports from messaging platforms:

from langchain_community.chat_loaders.utils import merge_chat_runs, map_ai_messages
from langchain_community.chat_loaders.telegram import TelegramChatLoader

# Load chat messages from a platform
loader = TelegramChatLoader(path="chat_export.json")
raw_messages = loader.lazy_load()

# Merge consecutive messages from the same sender
merged_messages = merge_chat_runs(raw_messages)

# Optionally convert certain user's messages to AI messages
messages = list(map_ai_messages(merged_messages, sender="Assistant"))

Fine-tuning Example

When preparing data for fine-tuning a language model, merging messages can help create better training examples:

from langchain_community.chat_loaders.utils import merge_chat_runs
from langchain_community.adapters.openai import convert_messages_for_finetuning

# Load your chat sessions
chat_sessions = loader.load()

# Merge consecutive messages
merged_sessions = list(merge_chat_runs(chat_sessions))

# Convert to training format
training_data = convert_messages_for_finetuning(merged_sessions)

Combining with Other Utilities

merge_chat_runs works well with other LangChain utilities for chat processing:

from langchain_community.chat_loaders.utils import merge_chat_runs, map_ai_messages
from langchain_openai import ChatOpenAI

# Process the chat sessions
raw_messages = loader.lazy_load()
merged_messages = merge_chat_runs(raw_messages)
ai_messages = map_ai_messages(merged_messages, sender="bot")

# Use with a language model
llm = ChatOpenAI()
for session in ai_messages:
    response = llm(session["messages"])

This utility is particularly valuable when:

Processing raw chat exports from messaging platforms
Preparing conversation data for fine-tuning
Creating cleaner conversation histories for LLM context
Analyzing chat patterns and user interactions

Remember that merge_chat_runs preserves the original message metadata while combining content, making it ideal for scenarios where you need both consolidated messages and original message details.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs