Using LangChain's Airtable Document Loader

Posted: Nov 20, 2024.

The AirtableLoader in LangChain provides a convenient way to load data from Airtable tables into Document objects that can be used in your LangChain applications. This guide will show you how to effectively use this loader to work with your Airtable data.

What is AirtableLoader?

AirtableLoader is a document loader designed to fetch and process data from Airtable tables. It converts each row in your Airtable table into a Document object, making it easy to integrate Airtable data into your LangChain workflows. The loader supports both synchronous and asynchronous operations, as well as lazy loading capabilities.

Reference

Here are the main methods available in AirtableLoader:

MethodDescription
load()Loads all table data into Document objects synchronously
aload()Loads all table data into Document objects asynchronously
lazy_load()Returns an iterator of Documents for memory-efficient loading
alazy_load()Returns an async iterator of Documents
load_and_split()Loads Documents and splits them into chunks using a text splitter

How to Use AirtableLoader

Initial Setup

Before using AirtableLoader, you'll need to install the required dependency:

pip install pyairtable

You'll also need:

  • An Airtable API key
  • Your base ID
  • Your table ID

Basic Usage

Here's how to initialize and use the loader:

from langchain_community.document_loaders import AirtableLoader

# Initialize the loader
loader = AirtableLoader(
    api_key="your_api_key",
    table_id="your_table_id", 
    base_id="your_base_id"
)

# Load all documents
docs = loader.load()

Loading with a Specific View

You can specify a particular view of your Airtable table:

loader = AirtableLoader(
    api_key="your_api_key",
    table_id="your_table_id", 
    base_id="your_base_id",
    view="your_view_name"
)

Async Loading

For better performance in async applications:

async def load_airtable_data():
    loader = AirtableLoader(
        api_key="your_api_key",
        table_id="your_table_id", 
        base_id="your_base_id"
    )
    docs = await loader.aload()
    return docs

Memory-Efficient Loading

For large tables, use lazy loading to process rows one at a time:

loader = AirtableLoader(
    api_key="your_api_key",
    table_id="your_table_id", 
    base_id="your_base_id"
)

# Process documents one at a time
for doc in loader.lazy_load():
    # Process each document
    print(eval(doc.page_content))

Working with Document Content

The loader converts each row into a Document where the content is a string representation of a dictionary containing the row data:

docs = loader.load()
first_row = eval(docs[0].page_content)
print(first_row)  # Prints: {'id': '...', 'createdTime': '...', 'fields': {...}}

Loading and Splitting Documents

If you need to split the documents into smaller chunks:

from langchain.text_splitter import CharacterTextSplitter

# Create a text splitter
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)

# Load and split documents
split_docs = loader.load_and_split(text_splitter)

Remember to handle your API keys securely and never commit them to version control. The AirtableLoader provides a flexible way to integrate your Airtable data into LangChain applications, whether you're building a document QA system, a data processing pipeline, or any other LangChain-powered application.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs