Loading Email Files with LangChain UnstructuredEmailLoader

Posted: Jan 31, 2025.

The UnstructuredEmailLoader in LangChain provides an easy way to extract content from email files, supporting both .eml and .msg formats. Let's explore how to use this document loader effectively.

What is UnstructuredEmailLoader?

UnstructuredEmailLoader is a document loader that processes email files and converts them into Document objects that can be used in your LangChain applications. It supports:

  • Reading .eml and .msg file formats
  • Extracting email content and metadata
  • Processing email attachments (optional)
  • Different loading modes for structured content extraction

Reference

Here are the key methods available in UnstructuredEmailLoader:

MethodDescription
load()Loads the email file and returns a list of Document objects
lazy_load()Iterator that loads Documents one at a time to save memory
aload()Asynchronous version of load()
alazy_load()Asynchronous version of lazy_load()
load_and_split()Loads and splits the documents using a text splitter

How to Use UnstructuredEmailLoader

Basic Usage

Here's a simple example of loading an email file:

from langchain_community.document_loaders import UnstructuredEmailLoader

# Initialize the loader with an email file
loader = UnstructuredEmailLoader("path/to/email.eml")

# Load the document
documents = loader.load()

Retaining Email Elements

By default, the loader combines all text elements. To keep the email structure and metadata, use the "elements" mode:

loader = UnstructuredEmailLoader(
    "path/to/email.eml",
    mode="elements"
)

documents = loader.load()

# The documents will contain metadata like:
# - sender
# - recipients
# - subject
# - date
# - file information

Processing Attachments

To handle email attachments, enable attachment processing:

loader = UnstructuredEmailLoader(
    "path/to/email.eml",
    mode="elements",
    process_attachments=True
)

# Load email with attachments
documents = loader.load()

Async Loading

For better performance in async applications:

async def load_email():
    loader = UnstructuredEmailLoader("path/to/email.eml")
    documents = await loader.aload()
    return documents

Memory-Efficient Loading

For large email files, use lazy loading to conserve memory:

loader = UnstructuredEmailLoader("path/to/large_email.eml")

# Iterate through documents one at a time
for document in loader.lazy_load():
    # Process each document
    process_document(document)

Before using UnstructuredEmailLoader, make sure you have the required dependencies installed:

pip install unstructured
pip install extract-msg  # For .msg file support

This loader is particularly useful when you need to:

  • Extract content from email archives for analysis
  • Process email threads for chatbots or QA systems
  • Build search functionality over email collections
  • Create structured data from email communications

Remember that the extracted content will be converted into Document objects that can be used with other LangChain components like text splitters, embeddings, or vector stores.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs