LLM Evaluation Platform
Compare models and prompts to find the best for your use case.
Ensure agents perform as expected.
Powering the world's best AI teams.
From next-gen startups to established enterprises.
CI/CD integrationEasily integrate into your CI/CD pipeline to ensure no regressions are introduced.
AI-powered checksUse our library of AI-powered assertors based on industry standards.
No API keysRun evaluations without the need for inference API keys. We take care of the infrastructure.
Powerful evaluation engine

Run benchmarks
Compare models, settings, and prompts to find the best one for your use case.
Define success metrics
Use our set of predefined metrics or define your own to evaluate your models.
How it works
Run evaluations directly from the Lunary dashboard or trigger them as part of your CI pipeline.
We recorded a 1 minute demo video.
(expect better as we ship a lot)
SDK examples
Kick off evaluations programmatically using the Lunary SDKs in just a few lines of code.
SDKs
Any LLM. Any framework.
Seamless integration with zero friction. Our SDKs are designed to be lightweight and integrate naturally into your codebase.








Minutes to magic.
Self-host or go cloud and get started in minutes.
Own Your Data
Self Hostable
1-line Integration
Prompt Templates
Chat Replays
Analytics
Topic Classification
Agent Tracing
Custom Dashboards
Score LLM responses
PII Masking
Feedback Tracking
Own Your Data
Self Hostable
1-line Integration
Prompt Templates
Chat Replays
Analytics
Topic Classification
Agent Tracing
Custom Dashboards
Score LLM responses
PII Masking
Feedback Tracking

