Using LangChain UnstructuredODTLoader for OpenDocument Files
Posted: Feb 18, 2025.
OpenDocument Text (ODT) files are an open standard format for word processing documents. In this guide, we'll explore how to work with ODT files in LangChain using the UnstructuredODTLoader.
What is UnstructuredODTLoader?
The UnstructuredODTLoader is a document loader class in LangChain that helps you extract and process text content from OpenDocument Text (ODT) files. It uses the Unstructured library under the hood to parse ODT files and convert them into LangChain Document objects that can be used in your document processing pipelines.
Reference
Here are the key methods available in UnstructuredODTLoader:
Method | Description |
---|---|
load() | Loads the ODT file and returns a list of Document objects |
lazy_load() | Loads the file lazily, returning an iterator of Document objects |
aload() | Asynchronously loads the ODT file |
alazy_load() | Asynchronously loads the file lazily |
load_and_split() | Loads the document and splits it into chunks using a text splitter |
How to Use UnstructuredODTLoader
Basic Usage
The simplest way to use the UnstructuredODTLoader is to initialize it with a file path and call the load() method:
Using Different Modes
The loader supports two modes: "single" and "elements". The mode determines how the document content is structured:
In "elements" mode, the document is split into different semantic elements like titles, paragraphs, and lists. This can be useful when you need more granular control over the document structure.
Additional Parameters
You can pass additional parameters to customize the document processing:
Lazy Loading
For large documents, you might want to use lazy loading to conserve memory:
Async Loading
The loader also supports asynchronous loading:
Splitting Documents
You can automatically split the document into chunks using a text splitter:
Remember to install the necessary dependencies (unstructured[all-docs]
or unstructured[odt]
) to work with ODT files. The UnstructuredODTLoader provides a flexible way to integrate ODT documents into your LangChain applications, whether you need basic document loading or more advanced processing features.
An alternative to LangSmith
Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.
LangChain DocsJoin 10,000+ subscribers
Every 2 weeks, latest model releases and industry news.
An alternative to LangSmith
Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.