Evaluations SDK
The SDK enables you to evaluate your runs directly within your code using your own LLM agents.
You or your team can create these assertions (which we refer to as a checklist) via the dashboard.
Examples of evaluation assertions include:
- Ensuring the LLM's response is at least 90% similar to an expected output (using cosine similarity)
- Verifying that the LLM's response contains a specific name or location
- Checking that the LLM's response is valid JSON
We are continuously expanding the range of assertions, offering limitless possibilities.
Pre-requisites
- Create a testing dataset on the dashboard.
- Create a checklist on the dashboard.
- Ensure you have an LLM agent that can be tested against the dataset (currently, it should accept an array of messages)
The testing dataset will be utilized to conduct the evaluations.
Assuming your agent is configured as follows:
Differences from other tools
Our platform distinguishes itself in the LLM testing and evaluation space for several reasons:
- Evaluations are managed via the dashboard rather than in code, which simplifies maintenance and fosters collaboration with non-technical team members. Although this approach offers less flexibility than custom code evaluations, our expanding set of blocks meets most requirements, including an upcoming custom code block.
- You can test metrics such as OpenAI's cost that are not directly accessible in your code, and you also have access to a historical record of evaluations.
- The platform is tightly integrated with features like Prompt Templates and Observability, for instance, enabling you to test templates before deployment.
Usage
We offer several methods to facilitate test execution within your code.
The overarching concept is to run your agent on a testing dataset and then assess the data captured from your agents and LLM calls.
Follow this step-by-step guide to execute tests in your code:
1
2
3
Example with testing framework
You can integrate the SDK with your testing framework of choice.
Here is an example of how you can integrate the SDK with a testing framework like pytest
: