Load Mastodon Posts with LangChain's MastodonTootsLoader

Posted: Feb 15, 2025.

Mastodon is a federated social media platform that offers an alternative to centralized networks. In this guide, we'll explore how to use LangChain's MastodonTootsLoader to extract content from Mastodon posts (known as "toots") for processing in your applications.

What is MastodonTootsLoader?

MastodonTootsLoader is a document loader that lets you fetch and process posts from Mastodon accounts. It can work with both public accounts without authentication and private accounts using API tokens. The loader converts Mastodon posts into Document objects that can be used in LangChain's document processing pipelines.

Reference

Here are the key methods and parameters available in MastodonTootsLoader:

MethodDescription
load()Loads toots and returns them as a list of Document objects
lazy_load()Returns an iterator of Document objects for memory-efficient loading
aload()Async version of load()
alazy_load()Async version of lazy_load()
load_and_split()Loads documents and splits them into chunks

Constructor parameters:

  • mastodon_accounts: List of Mastodon accounts to query (in @username@instance format)
  • number_toots: Number of toots to fetch per account (default: 100)
  • exclude_replies: Whether to exclude replies from the results (default: False)
  • access_token: API token for authenticated access
  • api_base_url: Base URL for the Mastodon instance (default: https://mastodon.social)

How to Use MastodonTootsLoader

Basic Usage with Public Accounts

Here's how to load toots from a public Mastodon account:

from langchain_community.document_loaders import MastodonTootsLoader

# Initialize the loader
loader = MastodonTootsLoader(
    mastodon_accounts=["@username@mastodon.social"],
    number_toots=50
)

# Load the documents
documents = loader.load()

# Process the loaded documents
for doc in documents:
    print(doc.page_content)

Authenticated Access

For private accounts or instances, you'll need to use authentication:

loader = MastodonTootsLoader(
    mastodon_accounts=["@username@mastodon.social"],
    access_token="your_access_token",
    api_base_url="https://your-instance.social",
    number_toots=50
)

You can also set the access token via environment variable:

export MASTODON_ACCESS_TOKEN="your_access_token"

Async Loading

For better performance in async applications:

async def load_toots():
    loader = MastodonTootsLoader(
        mastodon_accounts=["@username@mastodon.social"]
    )
    documents = await loader.aload()
    return documents

Memory-Efficient Loading

If you're working with a large number of toots, use lazy loading:

loader = MastodonTootsLoader(
    mastodon_accounts=["@username@mastodon.social"]
)

# Iterate through documents one at a time
for document in loader.lazy_load():
    process_document(document)

Working with Multiple Accounts

You can load toots from multiple accounts simultaneously:

loader = MastodonTootsLoader(
    mastodon_accounts=[
        "@user1@mastodon.social",
        "@user2@different-instance.social",
        "@user3@another-instance.social"
    ],
    number_toots=25  # Will fetch 25 toots from each account
)

Note that the loaded documents' page_content will contain HTML as returned by the Mastodon API. If you need plain text, you'll need to process the HTML content accordingly.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs