April 16, 2024
Fix: user cost usage doesn't change when the period changes in analytics
April 15, 2024
Add documentation for projects API

https://lunary.ai/docs/api/projects

Private run URL now opens the side panel instead of a dedicated page for authenticated usersc

Screen Recording 2024-04-15 at 6.09.19β€―PM.mov

New changelog page connected to Linear

https://lunary.ai/changelog

Fix sharing with public link returns "Unauthorized Access"
Fix: can't delete dataset

image.png

April 14, 2024
Fix clicking on dataset "+ new" in eval playground doesn't work
April 13, 2024
April 12, 2024
Fix: numeric filters not working with decimal values
April 9, 2024
Add Sentry on Next.js
April 7, 2024
April 6, 2024
April 5, 2024
Cache playwright install in pipeline
April 4, 2024

February 2024 Update

This month, we focused on improving our existing features, especially evaluations and features tailored for enterprise.

Improve prompt menu: search and sorting

Added Mistral Large, medium and all Claude models to the playground

Radars: improved PII and Profanity detection on all languages

metadata tracking

You can now set metadata field witht the data you're tracking and filter by that in the dashboard.

Filters

  • We've (re)added the feedback filter
  • And improved Users filter significantly added a search if you have a lot of users

Private API keys

For security reasons, we've had to disable Private API keys temporarily last month. You can find them again in your Settings page to query our API (we will improve its documentation.)

Again, lots of improvements with Evaluations:

  • Better results table, making it clearer how results relate to one another
  • "Providers" instead of models so you can experiment with params (such as temperature)
  • Added queue for better performance and fixes Anthropic rate limiting
  • Support running evals on big datasets with many models
  • Live progress feedback for long evals
  • Error handling for failing evaluations

New enterprise features

Enterprises can use our new Helm Charts to deploy with kubernetes easily.

We've also released an update to our Role Based Access Control system to granurly set permissions to the persons in your team.

Fixes:

  • JS SDK: Fix OpenAI streaming types
  • Cascading of Feedback from Thread messages
  • Fix showing params such as maxTokens which would not always show
April 1, 2024
Enterprise landing page

https://lunary.ai/enterprise

March 31, 2024
Fix some tool calls causes app to crash

image.png

March 30, 2024
Fix: comment icon color

image.png

Better matrix rendering for evaluation results
March 29, 2024
Feedback cascading

Show feedback from parent thread messages into LLM logs datable.

Improve User filter

image.png

March 27, 2024
New feedback filter

image.png

March 26, 2024
Python SDK & Langchain: Proper separation of "metadata" and "params"

Params = Things not set by the user, typically model settings (temperature, top_p, etc). Metadata = custom data set by the user excluding things like "name", "userId", "userProps", "tags" that may be passed via "metadata" field to langchain.

Add filter for metadatas
March 24, 2024
New `metadata` column for storing custom data
March 22, 2024
Improve login for password managers

By rendering password field instead and hiding it, so it still autofills

March 14, 2024
March 10, 2024

February 2024 Update

After the busy last months that resulted in almost an overall make-over of the platform, in February, we've focused on stability improvements across the board. We've also made a number of improvements to the dashboard UI, the evaluations, and the templates.

Improved UI for Traces

We've improved the UI of the traces to be more readable and work better with smaller screens.

Traces

Evaluation SDK in general availability

All Unlimited and Enterprise users now have access to the Evaluation SDK.

Create CI pipelines for your agents, benchmark your RAG pipelines, and more. The SDK is now generally available and ready for production use.

Here is an example of how to use the SDK:

testing_dataset = lunary.get_dataset("test-cases")
for item in testing_dataset:
# Run your agent on each input of the dataset
res = support_agent(item.input)
# Evaluate the output
passed, details = lunary.evaluate(
checklist='ci-checks',
input=item.input,
output=res,
ideal_output=item.ideal_output
)
if passed:
print(f"Test passed!", 'green')
else:
print(f"Test failed!")

Text-only datasets

You can now use text-only datasets (instead of chat-messages datasets only before). This makes it easier to create test cases for your custom agents.

get_langchain_template methods

For those that work with LangChain, we've added new methods to the SDK that can pull your templates directly as LangChain's ChatPromptTemplate and PromptTemplate classes.

This makes it much easier to work with chains and LangChain in general.

Example:

template = lunary.get_langchain_template("my-template")
messages = lc_template.format_messages(question="What is the capital of France?")

Make sure to update your SDK to the latest version to use these new methods.

Claude 3 in the playground

Come test your prompts with the new state-of-the-art model. We've found it's better and more concise than GPT-4 in many cases.

Mistral and Claude 3 prices

We've added tracking of Mistral and Claude 3 prices in the dashboard.

Self-hosting

We've removed the dependency on PgGroonga to make it easier to self-host the platform with hosted Postgres services like AWS RDS.

This means Lunary is now also compatible with Postgres 16.

3 projects for free users

We've increased the number of projects that free users can have from 1 to 3. This should help you better organize your work and keep your projects separate.

Parsing of LangChain chains

In the Python SDK, we've pushed numerous improvements to the parsing of LangChain traces. The constant updates to the LangChain format have made it difficult to keep up with the changes, but we're made significant progress in this area.

Fixes

We've fixed a number of issues across the platform where:

  • Evaluations would not start correctly from the dashboard
  • Editing datasets would stutter and not work correctly
  • Deleting a template would not work correctly
  • Switching between text and chat templates would break the playground
  • Invalid tools would prevent the playground from running
  • In JS SDK await lunary.flush not working
  • JS SDK crashing in old Node environments because of crypto module
February 22, 2024
February 20, 2024
February 11, 2024
January 31, 2024

January 2024 Update

January has been our most productive month to date.

We've successfully migrated our entire platform to a new, more efficient architecture.Β Our new Radars and Evaluations features are now publicly available to all users on Unlimited or self-hosted plans, and we've pushed a lot of usage improvements to the app.

Platform Overhaul

We've revamped our entire source code and eliminated the need for Vercel and Supabase, leading to several improvements:

  • Enhanced scalability of the app
  • Access to all dashboard data via the API
  • 10x increase in our speed of rolling out new features
  • Less security concerns and error-surface associated with 3rd-party dependencies
  • Simplified self-hosting setup
  • Quicker dashboard performance

This is what it looks like in our GitHub graph:

Radars

Radars have moved out of private access.

Radars are AI-powered alerts that monitor your runs for specific conditions, such as personal data, profanity, or negative sentiment.

Internally, we've deployed efficient, lightweight AI models that scan runs without relying on external API queries, perfect for self-hosted setups.

Evaluations

We've reimagined LLM evaluations from scratch, opting for no-code approach.

You can design and execute evaluations directly from the dashboard without needing to be a Python expert or have prior evaluation experience.

This is made possible through intuitive blocks that assess various aspects like:

  • How closely your model's outputs match an ideal output
  • The presence of hallucination in responses
  • Usage of costs and tokens

You can create evaluations with 20+ models, including open-source options like Mistral, directly from the dashboard.

Some evaluation scenarios still require code---say for testing your custom agents or integrating into a CI pipeline---we will soon release the Evaluation SDK.Β This allows for the execution of dashboard-created evals your codebase, enabling contributions from both technical and non-technical team members.

We're continuously refining these features and are open to providing private access to the Evaluations SDK to those interested.

App Enhancements

We've pushed a lot of improvements to the app, such as:

  • Merged "LLM Calls", "Chats" and "Traces" into a single "Logs" page, improving filters and search functionality across all data types.
  • New filtering engine with a lot of additional filters (with more on the way).
  • Add tool calls to templates and preview them in your runs.
  • Button to duplicate templates
  • Improved filter and dashboard speed.
  • Simpler navigation around the app

Alongside these updates, we've made numerous fixes and are focused on extending these advancements to more affordable plans as we further optimize and reduce costs.

We're eagerly await your feedback :)

Also, if anyone in San Francisco wants to meet up with the founders, we're here for the month. Hit us up!

December 12, 2023

December 2023 Update

πŸ“™ Prompt Templates in Alpha

Our new Prompt Templates feature is ready in alpha today.

Collaborate with non-technical team members, version prompts and decouple prompts from your source-code.

It's available today to all users, regardless of your plan and the docs can be found here.

JS integration is ready and we'll release the Python integration later this week.

Any feedback on this is greatly appreciated - we will iterate on this in the coming weeks with your feedback.

🐍 Chat tracking in Python

We heard you and simplified a lot the API to track messages.

You can also now track users' conversations directly from your Python or JS backend, with a much simpler API.

Check out the new Chat tracking docs here.

πŸ“· Support for OpenAI vision

We now support OpenAI's vision models and you can see the pictures used in your dashboard.

vision

πŸ€– More models in Playground

Find Gemini Pro and Mixtral in the prompt playground.

πŸš„ Faster dashboard

We've turbocharged our dashboard and Postgres database for quicker data rendering. Search, filters and navigation should be much quicker.

It's still a work in progress, but you should already feel the difference with heavy data loads.

There is also a lot more in terms of bug fixed and UI improvements to the dashboard.

Enjoy the holidays!

December 11, 2023

LLMonitor is now Lunary.ai

Big news from our corner! We're shifting gears and our startup's name is changing from LLMonitor to Lunary.ai.

First off - saying 'LLMonitor' wasn't the smoothest. We heard you trying to pronounce it (and saw some of those puzzled looks). So, we're making it easier. Lunary rolls off the tongue way better, doesn't it?

Also, a bit of a hiccup with Google. Our early SEO experiments, let's just say, were a tad too experimental and abrupt. Google wasn't thrilled, and we got a penalty on our main llmonitor.com domain making us impossible to be found on Google. Lesson learned.

But hey, every cloud has a silver lining. This change isn't just about a new name. It's a fresh start, a clearer identity. Lunary.ai reflects our work better - to build the best AI developer platform.

Stay tuned for what's next. We're just getting started.

December 9, 2023
October 28, 2023
October 23, 2023
September 12, 2023
August 16, 2023
August 3, 2023