Combine Document Loaders in LangChain with MergedDataLoader
Posted: Jan 31, 2025.
When working with documents in LangChain, you might need to load data from multiple sources simultaneously. The MergedDataLoader provides a convenient way to combine documents from different loaders into a single collection.
What is MergedDataLoader?
MergedDataLoader is a utility class in LangChain that allows you to combine multiple document loaders into a single loader. This is particularly useful when you need to process documents from different sources (like PDFs, web pages, or databases) in a unified way.
Reference
Here are the main methods available in MergedDataLoader:
Method | Description |
---|---|
load() | Loads all documents from all loaders and returns them as a list |
lazy_load() | Creates an iterator to load documents lazily from each loader |
aload() | Asynchronously loads all documents from all loaders |
alazy_load() | Creates an async iterator to load documents lazily |
load_and_split() | Loads documents and splits them into chunks using a text splitter |
How to use MergedDataLoader
Let's look at different ways to use MergedDataLoader in your applications.
Basic Usage
The simplest way to use MergedDataLoader is to combine multiple loaders and load all documents at once:
Lazy Loading
When dealing with large documents, you might want to load them lazily to manage memory usage:
Loading and Splitting Documents
If you need to split your documents into smaller chunks, you can use the load_and_split()
method:
Asynchronous Loading
For better performance in async applications, you can use the asynchronous loading method:
The MergedDataLoader is a powerful tool when you need to work with multiple document sources in your LangChain applications. It provides flexibility in how you load and process documents, whether you need them all at once or prefer to process them one at a time.
Remember that the performance and memory usage will depend on how you choose to load the documents (lazy vs. eager loading) and the size of your document sources. Choose the appropriate loading method based on your specific use case and requirements.
An alternative to LangSmith
Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.
LangChain DocsJoin 10,000+ subscribers
Every 2 weeks, latest model releases and industry news.
An alternative to LangSmith
Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.