Open-Source
LLM Evaluation Platform
Compare models and prompts to find the best for your use case.
Ensure agents perform as expected.
evaluations.hero.ctaSubText
More than 5000 AI developers chose Lunary to build better chatbots
CI/CD integrationEasily integrate into your CI/CD pipeline to ensure no regressions are introduced.
AI-powered checksUse our library of AI-powered assertors based on industry standards.
No API keysRun evaluations without the need for inference API keys. We take care of the infrastructure.
Powerful evaluation engine
Run benchmarks
Compare models, settings, and prompts to find the best one for your use case.
Define success metrics
Use our set of predefined metrics or define your own to evaluate your models.
Run benchmarks from the dashboard...
(expect better as we ship a lot)
SDKs
Any LLM. Any framework.
Seamless integration with zero friction. Our SDKs are designed to be lightweight and integrate naturally into your codebase.
Minutes to magic.
Self-host or go cloud and get started in minutes.
Open Source
Self Hostable
1-line Integration
Prompt Templates
Chat Replays
Analytics
Topic Classification
Agent Tracing
Custom Dashboards
Score LLM responses
PII Masking
Feedback Tracking
Open Source
Self Hostable
1-line Integration
Prompt Templates
Chat Replays
Analytics
Topic Classification
Agent Tracing
Custom Dashboards
Score LLM responses
PII Masking
Feedback Tracking