Load Custom Datasets in LangChain for Model Evaluation

Posted: Feb 5, 2025.

The LangChain library provides tools for working with custom datasets to help evaluate and test language models. In this guide, we'll explore the load_dataset module that allows you to easily import datasets from the LangChainDatasets collection on Hugging Face.

What is load_dataset?

load_dataset is a utility function in LangChain's evaluation module that provides a simple interface to load pre-made datasets from the LangChainDatasets collection on Hugging Face. These datasets can be used for various purposes like training, testing, and evaluating language models or chains.

Reference

MethodParametersReturn TypeDescription
load_dataseturi: strList[Dict]Loads a dataset from the LangChainDatasets collection on Hugging Face. Returns data as a list of dictionaries where each dictionary represents a row in the dataset.

Prerequisites

Before using load_dataset, you need to install the required dependency:

pip install datasets

How to Use load_dataset

Basic Usage

The simplest way to use load_dataset is to specify the dataset name you want to load:

from langchain.evaluation import load_dataset

# Load the llm-math dataset
dataset = load_dataset("llm-math")

# The dataset is returned as a list of dictionaries
print(dataset[0])  # Print the first row

Working with Different Datasets

You can load different datasets available in the LangChainDatasets collection:

# Load different types of datasets
math_dataset = load_dataset("llm-math")
qa_dataset = load_dataset("question-answering")
summarization_dataset = load_dataset("summarization")

Processing Dataset Entries

Since the data is returned as a list of dictionaries, you can easily process or filter the entries:

from langchain.evaluation import load_dataset

# Load a dataset
dataset = load_dataset("llm-math")

# Filter or process entries
for entry in dataset:
    # Access fields in each entry
    if "question" in entry:
        question = entry["question"]
        answer = entry.get("answer", "No answer available")
        
        # Process the entry
        print(f"Question: {question}")
        print(f"Answer: {answer}")
        print("---")

Using Datasets for Evaluation

The loaded datasets can be used to evaluate your language models or chains:

from langchain.evaluation import load_dataset
from langchain.llms import OpenAI

# Load a dataset
dataset = load_dataset("llm-math")

# Initialize your language model
llm = OpenAI()

# Evaluate model performance
results = []
for entry in dataset:
    question = entry["question"]
    expected_answer = entry["answer"]
    
    # Get model's prediction
    prediction = llm(question)
    
    # Compare prediction with expected answer
    is_correct = prediction.strip() == expected_answer.strip()
    results.append(is_correct)

# Calculate accuracy
accuracy = sum(results) / len(results)
print(f"Model accuracy: {accuracy:.2%}")

The load_dataset function provides a convenient way to access pre-made datasets for various language model evaluation tasks. By using these datasets, you can consistently test and evaluate your models using standardized data, making it easier to benchmark performance and compare different approaches.

Remember that the available datasets in the LangChainDatasets collection might change over time, so it's a good practice to check the Hugging Face hub for the most up-to-date list of available datasets.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs