Analyzing Test Results with LangChain TestResult
Posted: Feb 7, 2025.
When evaluating LLMs and chains in LangChain, you often need to analyze test results and feedback metrics. The TestResult class provides specialized functionality to work with evaluation data by extending Python's dictionary functionality with methods specific to LangChain testing.
What is TestResult?
TestResult is a dictionary subclass designed to store and analyze the results of LangChain evaluations. It provides additional methods to work with feedback scores and convert results into pandas DataFrames for further analysis. This makes it particularly useful when you need to analyze the performance of your LLM applications.
Reference
Here are the key methods specific to TestResult:
Method | Description |
---|---|
get_aggregate_feedback() | Returns a pandas DataFrame containing quantiles for feedback scores across all feedback keys |
to_dataframe() | Converts the test results into a pandas DataFrame for analysis |
The class also inherits all standard dictionary methods like get()
, update()
, items()
, etc.
How to Use TestResult
Let's look at different ways to work with TestResult objects.
Basic Usage
TestResult objects work like regular dictionaries for storing test data:
Analyzing Feedback Scores
The get_aggregate_feedback()
method is particularly useful for analyzing score distributions:
Converting to DataFrame
For more detailed analysis, you can convert the results to a pandas DataFrame:
Working with Multiple Test Results
TestResult supports dictionary operations for combining multiple test results:
TestResult provides a structured way to handle evaluation results in LangChain, making it easier to analyze and understand the performance of your language models and chains. The combination of dictionary functionality with specialized analysis methods makes it a powerful tool for LLM evaluation workflows.
An alternative to LangSmith
Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.
LangChain DocsJoin 10,000+ subscribers
Every 2 weeks, latest model releases and industry news.
An alternative to LangSmith
Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.