Changelog

Below is a comprehensive list of all changes made to the project. Automatically updated every day.
Have a feature request or bug report? Please reach out to us on the livechat.
See what we are planning to ship in the roadmap.

4 days ago

Allow custom inference API keys for LLM evaluators

Switch ML Evaluators (such as PII) to more scalable infrastructure

To keep up with growth, we need to switch PII scanning and sentiment analysis evaluators into a more scalable infrastructure.

10 days ago

New date range filter for logs

image.png

Allow manually changing theme from UI

Toggle between auto, light and dark theme manually.

image.png

11 days ago

Filtering of Projects

You can now filter projects in the project selector.

image.png

Saved Views

Save combination of filters into reusable "Views".

21 days ago

Improved user details

image.png

23 days ago

Improve performances of search

Need to create the indexes

26 days ago

Export Evaluation results to CSV

You can now export Playground evaluation results to a CSV file.

image.png

about 1 month ago

SDK methods to pull all prompts at once

about 1 month ago

Button to expand and easily edit long Prompt variables

Screenshot 2024-06-21 at 4.42.24β€―PM.png

Screenshot 2024-06-21 at 4.43.05β€―PM.png

Changelog email May + April

Add `runtime` option to Lunary JS constructor

about 1 month ago

Analytics v2

The new version of Analytics will contain many new charts and insights, the ability to filter data by tags, metadata or users, as well as a new hourly breakdown of historic data.

about 1 month ago

OpenAI-compliant JSONL fine-tune exports

To make it even easier to fine-tune models from your Lunary data, we're making the export directly compatible with OpenAI's fine-tune format.

about 2 months ago

Button to delete users data from the dashboard

Make it easier to comply with privacy requests.

about 2 months ago

April+May 2024 Update

Here is quick rundown of what we've built in April and May ⬇️

πŸ“ˆ Analytics

We've heard you, Analytics really needed a revamp. We're releasing for everyone (including free users) the next generation of Analytics.

Find new charts and insights, filter data and select any date range you want. It also introduces an hourly breakdown for more granular insights.

πŸ› οΈ Tool calling data in prompts

You can now edit the assistant's tools calling content in your prompts, allowing to build Chain-of-thought prompts using tools. This enables many use-cases.

But also:

  • πŸ’΅ New Pricing: More affordable pricing structure at $20 per seat per month for the cloud version. Also more generous in features.
  • πŸ›£οΈ Public Roadmap: Follow upcoming improvements on our public roadmap.
  • πŸ—’οΈ Prompt Notepad: Added a free text field on prompts to provide context and explanations for your prompt designs.
  • πŸ—οΈ Direct Fine-tune Exports: Export data directly to a OpenAI-compatible JSONL file, allowing you to fine-tune GPT models in no time.
  • ⛩️ Templates: View which template was used for an LLM call as well as filter LLM logs by template. You will now also see in the Analytics section your most used templates.
  • πŸ‘₯ Delete Users: Delete users and associated data in 1-click from their page to easily comply with privacy requests.
  • πŸ€– GPT-4o: We've added full support for GPT-4o the day it was released.
  • 🧬 Tool Calls generator: Easily generate OpenAI tool calling schemas with our public generator.
  • πŸ‘Ύ Self-hosting Enhancements: We've refined the Helm charts to be even easier to setup and allowed users to use Azure OpenAI.
  • πŸͺ² Bug Squashing: Too numerous to mention here but our engineering team has been extremely busy fixing bugs all across the board. Stability is a big priority for the near-term for us. If you had an issue with something not working in the past, there are good chances it's resolved.

Short-term roadmap

  • ⛑️ Evaluations: We'll be releasing lots of improvements to our Evaluations (and Real-time Evaluations) for better performance, stability and flexibility when working with custom evaluation logics. For example, you'll be able to push the results of your own evaluators to Lunary.
  • πŸ€– Prompts A/B testing: We'll be pushing tools to easily A/B test prompt versions and determine which version is more efficient.
  • πŸ“¦ Integrations: We've been slow to add new integrations as they take our team a lot of time to maintain and keep up-to-date, but we'll be ramping up the development of new integrations, starting with LlamaIndex and MotleyCrew.

about 2 months ago

Public Roadmap page

See what we're planning to build next with our new public roadmap: https://lunary.ai/roadmap

about 2 months ago

Remove OpenTelemetry dependency from Python package

Langchain Python: Tool tracking not reporting ID and name

about 2 months ago

Fix: exports only export 50 more recent runs

Notepad: free text comment/description/note field on a prompt

image.png

Use cases:

  • provide a longer description of for us to identify what a prompt is being used for.
  • explain decisions we made when designing & optimizing the prompt that may seem odd for someone reading prompt, and to prevent someone possibly rewriting it back to a version we know didn't work well. (e.g. "we added sentence X to prompt because otherwise query Y would misbehave in way Z.")

2 months ago

Attach tool calls & output when doing "Open in Playground"

2 months ago

Radar toxic language not showing up in logs when filtering by Radar

2 months ago

All AI evaluators now use GPT-4o

Switch cal.com scheduling to Savvycal on Signup page

image.png

2 months ago

Improve users tab wrapping of long user names

Before:

Screenshot 2024-05-16 at 4.56.08β€―PM.png

After:

Screenshot 2024-05-16 at 4.56.58β€―PM.png

2 months ago

Add `SSL_DO_NOT_VERIFY` env var for Python requests

gpt-4o in playground and pricing

2 months ago

New, more affordable pricing

20$ / seat / month

Add tools schema generator link to Prompts dashboard

3 months ago

Column & filter for prompt template in LLM calls

image.png

Prompt playground: allow editing `tool` and `assistant` messages with tool calling data

image.png

Fix deploying template doesn't work if no prompt selected

3 months ago

Fix feedback filters sometimes not working

3 months ago

GPT support assistant

We're experimenting with a new support assistant to assist with common requests and help our small team regain focus.

Fix: Issue with ChatGPT calls not reported using litellm

Issue Description: ChatGPT tool calls are not reported when using litellm. The error involves ChatCompletionMessageToolCall objects not being JSON serializable during event tracking in the Lunary and litellm libraries.

3 months ago

Fix streaming with Azure OpenAI on playground

Fix: removing prompt from dataset not working

image.png

Fix: some evaluations are not refreshing

When running an eval, it wouldn't show all the results directly, needed to force a mutation after a couple seconds.

Improve Evaluations navigations

image.png

3 months ago

Fix non-multiple select filter can select multiple

image.png

3 months ago

Fix copying invite link doesn't work

Because of a CVEC, I've removed single_use_token from payloads. It's needed for the invite link. It's now only returned if the current user is an admin or owner

Feedback can now be added from the App

OpenAI Tools Call schema generator

lunary.ai/tool-calls-generator

3 months ago

Website i18n

Starting with French, Spanish, Japanese & Chinese

Grok Tokenizer + improve tokenizer UIs

image.png

Button to import runs to datasets

3 months ago

Improve filters UI

Before:

Screenshot 2024-04-19 at 2.54.53β€―PM.png

After:

Screenshot 2024-04-19 at 2.55.04β€―PM.png

Before:

Screenshot 2024-04-19 at 3.06.12β€―PM.png

After:

Screenshot 2024-04-19 at 4.21.44β€―PM.png

Fix users' filter search bar not filtering

3 months ago

Private run URL now opens the side panel instead of a dedicated page for authenticated usersc

Screen Recording 2024-04-15 at 6.09.19β€―PM.mov

Add documentation for projects API

https://lunary.ai/docs/api/projects

Fix: user cost usage doesn't change when the period changes in analytics

Fix: can't delete dataset

image.png

Fix sharing with public link returns "Unauthorized Access"

Fix: numeric filters not working with decimal values

Playground: Add support for Azure OpenAI

3 months ago

New changelog page connected to Linear

https://lunary.ai/changelog

3 months ago

Enterprise landing page

https://lunary.ai/enterprise

Fix clicking on dataset "+ new" in eval playground doesn't work

3 months ago

Cache playwright install in pipeline

Python SDK & Langchain: Proper separation of "metadata" and "params"

Params = Things not set by the user, typically model settings (temperature, top_p, etc). Metadata = custom data set by the user excluding things like "name", "userId", "userProps", "tags" that may be passed via "metadata" field to langchain.

4 months ago

March 2024 Update

This month, our focus was on enhancing existing features, particularly evaluations and enterprise-specific functionalities.

Improvements in Prompt Menu

  • Enhanced search and sorting capabilities.

Model Additions in Playground

  • Introduced Mistral Large, Medium, and all Claude models.

Enhancements in Radars

  • Fix issues with Radars not running properly ML models.
  • Improved PII and Profanity detection across all languages.

Metadata Tracking

  • You can now set and filter by the metadata field within the dashboard for better data tracking.

Filters Enhancements

  • Reintroduced the feedback filter.
  • Significantly improved the Users filter, including a new search feature for managing large user bases.

Private API Keys Update

  • Temporarily disabled last month for security reasons, Private API keys are now re-enabled. Access them in your Settings page. Documentation improvements are forthcoming.

Evaluations Enhancements

  • Enhanced results table for clearer relationships between results.
  • Transitioned from models to "Providers" to allow experimentation with parameters like temperature.
  • Implemented a queue system for enhanced performance and to address Anthropic rate limiting.
  • Added support for running evaluations on large datasets with multiple models.
  • Introduced live progress feedback for lengthy evaluations.
  • Improved error handling for failed evaluations.

New for Enterprises

  • Deployment made simpler with new Helm Charts for Kubernetes and monolithic Docker images.
  • Updated our Role Based Access Control system to allow more granular permission settings.

Fixes

  • JS SDK: Resolved issues with OpenAI streaming types.
  • Addressed cascading feedback issues from thread messages.
  • Fixed display issues with parameters such as maxTokens.

4 months ago

Fix some tool calls causes app to crash

image.png

Better matrix rendering for evaluation results

Feedback cascading

Show feedback from parent thread messages into LLM logs datable.

New feedback filter

image.png

Fix: comment icon color

image.png

Improve User filter

image.png

4 months ago

Add filter for metadatas

New `metadata` column for storing custom data

Improve login for password managers

By rendering password field instead and hiding it, so it still autofills

5 months ago

February 2024 Update

After the busy last months that resulted in almost an overall make-over of the platform, in February, we've focused on stability improvements across the board. We've also made a number of improvements to the dashboard UI, the evaluations, and the templates.

Improved UI for Traces

We've improved the UI of the traces to be more readable and work better with smaller screens.

Traces

Evaluation SDK in general availability

All Unlimited and Enterprise users now have access to the Evaluation SDK.

Create CI pipelines for your agents, benchmark your RAG pipelines, and more. The SDK is now generally available and ready for production use.

Here is an example of how to use the SDK:

testing_dataset = lunary.get_dataset("test-cases")
for item in testing_dataset:
# Run your agent on each input of the dataset
res = support_agent(item.input)
# Evaluate the output
passed, details = lunary.evaluate(
checklist='ci-checks',
input=item.input,
output=res,
ideal_output=item.ideal_output
)
if passed:
print(f"Test passed!", 'green')
else:
print(f"Test failed!")

Text-only datasets

You can now use text-only datasets (instead of chat-messages datasets only before). This makes it easier to create test cases for your custom agents.

get_langchain_template methods

For those that work with LangChain, we've added new methods to the SDK that can pull your templates directly as LangChain's ChatPromptTemplate and PromptTemplate classes.

This makes it much easier to work with chains and LangChain in general.

Example:

template = lunary.get_langchain_template("my-template")
messages = lc_template.format_messages(question="What is the capital of France?")

Make sure to update your SDK to the latest version to use these new methods.

Claude 3 in the playground

Come test your prompts with the new state-of-the-art model. We've found it's better and more concise than GPT-4 in many cases.

Mistral and Claude 3 prices

We've added tracking of Mistral and Claude 3 prices in the dashboard.

Self-hosting

We've removed the dependency on PgGroonga to make it easier to self-host the platform with hosted Postgres services like AWS RDS.

This means Lunary is now also compatible with Postgres 16.

3 projects for free users

We've increased the number of projects that free users can have from 1 to 3. This should help you better organize your work and keep your projects separate.

Parsing of LangChain chains

In the Python SDK, we've pushed numerous improvements to the parsing of LangChain traces. The constant updates to the LangChain format have made it difficult to keep up with the changes, but we're made significant progress in this area.

Fixes

We've fixed a number of issues across the platform where:

  • Evaluations would not start correctly from the dashboard
  • Editing datasets would stutter and not work correctly
  • Deleting a template would not work correctly
  • Switching between text and chat templates would break the playground
  • Invalid tools would prevent the playground from running
  • In JS SDK await lunary.flush not working
  • JS SDK crashing in old Node environments because of crypto module

6 months ago

January 2024 Update

January has been our most productive month to date.

We've successfully migrated our entire platform to a new, more efficient architecture.Β Our new Radars and Evaluations features are now publicly available to all users on Unlimited or self-hosted plans, and we've pushed a lot of usage improvements to the app.

Platform Overhaul

We've revamped our entire source code and eliminated the need for Vercel and Supabase, leading to several improvements:

  • Enhanced scalability of the app
  • Access to all dashboard data via the API
  • 10x increase in our speed of rolling out new features
  • Less security concerns and error-surface associated with 3rd-party dependencies
  • Simplified self-hosting setup
  • Quicker dashboard performance

This is what it looks like in our GitHub graph:

Radars

Radars have moved out of private access.

Radars are AI-powered alerts that monitor your runs for specific conditions, such as personal data, profanity, or negative sentiment.

Internally, we've deployed efficient, lightweight AI models that scan runs without relying on external API queries, perfect for self-hosted setups.

Evaluations

We've reimagined LLM evaluations from scratch, opting for no-code approach.

You can design and execute evaluations directly from the dashboard without needing to be a Python expert or have prior evaluation experience.

This is made possible through intuitive blocks that assess various aspects like:

  • How closely your model's outputs match an ideal output
  • The presence of hallucination in responses
  • Usage of costs and tokens

You can create evaluations with 20+ models, including open-source options like Mistral, directly from the dashboard.

Some evaluation scenarios still require code---say for testing your custom agents or integrating into a CI pipeline---we will soon release the Evaluation SDK.Β This allows for the execution of dashboard-created evals your codebase, enabling contributions from both technical and non-technical team members.

We're continuously refining these features and are open to providing private access to the Evaluations SDK to those interested.

App Enhancements

We've pushed a lot of improvements to the app, such as:

  • Merged "LLM Calls", "Chats" and "Traces" into a single "Logs" page, improving filters and search functionality across all data types.
  • New filtering engine with a lot of additional filters (with more on the way).
  • Add tool calls to templates and preview them in your runs.
  • Button to duplicate templates
  • Improved filter and dashboard speed.
  • Simpler navigation around the app

Alongside these updates, we've made numerous fixes and are focused on extending these advancements to more affordable plans as we further optimize and reduce costs.

We're eagerly await your feedback :)

Also, if anyone in San Francisco wants to meet up with the founders, we're here for the month. Hit us up!

8 months ago

December 2023 Update

πŸ“™ Prompt Templates in Alpha

Our new Prompt Templates feature is ready in alpha today.

Collaborate with non-technical team members, version prompts and decouple prompts from your source-code.

It's available today to all users, regardless of your plan and the docs can be found here.

JS integration is ready and we'll release the Python integration later this week.

Any feedback on this is greatly appreciated - we will iterate on this in the coming weeks with your feedback.

🐍 Chat tracking in Python

We heard you and simplified a lot the API to track messages.

You can also now track users' conversations directly from your Python or JS backend, with a much simpler API.

Check out the new Chat tracking docs here.

πŸ“· Support for OpenAI vision

We now support OpenAI's vision models and you can see the pictures used in your dashboard.

vision

πŸ€– More models in Playground

Find Gemini Pro and Mixtral in the prompt playground.

πŸš„ Faster dashboard

We've turbocharged our dashboard and Postgres database for quicker data rendering. Search, filters and navigation should be much quicker.

It's still a work in progress, but you should already feel the difference with heavy data loads.

There is also a lot more in terms of bug fixed and UI improvements to the dashboard.

Enjoy the holidays!

8 months ago

LLMonitor is now Lunary.ai

Big news from our corner! We're shifting gears and our startup's name is changing from LLMonitor to Lunary.ai.

First off - saying 'LLMonitor' wasn't the smoothest. We heard you trying to pronounce it (and saw some of those puzzled looks). So, we're making it easier. Lunary rolls off the tongue way better, doesn't it?

Also, a bit of a hiccup with Google. Our early SEO experiments, let's just say, were a tad too experimental and abrupt. Google wasn't thrilled, and we got a penalty on our main llmonitor.com domain making us impossible to be found on Google. Lesson learned.

But hey, every cloud has a silver lining. This change isn't just about a new name. It's a fresh start, a clearer identity. Lunary.ai reflects our work better - to build the best AI developer platform.

Stay tuned for what's next. We're just getting started.