Using the Batch API with Azure OpenAI

Posted: Oct 22, 2024.

Dealing with massive datasets or generating content at scale can be resource intensive and costly. Batch APIs are a cost-efficient way of processing large-scale LLM tasks efficiently.

In this guide, we'll walk through how to integrate with the Batch API under Azure OpenAI. We will go in-depth on setup, execution, practical examples and troubleshooting tips along the way.

What is Azure OpenAI's Batch API?

The Azure OpenAI Batch API allows handling large, asynchronous groups of requests while saving a lot on costs (50% cost reduction compared to the standard global pricing).

Instead of sending individual requests one-by-one, the Batch API allows handling very large workloads bundled into a JSON lines (JSONL) file.

Batch requests have their own enqueued token quotas, which means they do not interfere with real-time workloads.

Prerequisites

Before diving into batch processing, you will need the following:

  1. Azure Subscription: Create one for free if you don't have one.

  2. Azure OpenAI Resource: Ensure you have a deployed Azure OpenAI model of the Global-Batch type (Check out set-up steps below).

  3. Python: Python 3.8 or later version

Setting up the Azure OpenAI Resource

  1. Visit https://ai.azure.com/ and log in using your Azure credentials.
  2. Search for Azure OpenAI in the Azure services menu. Azure OpenAI Menu
  3. Click Create to set up a new Azure OpenAI resource. Provide the Subscription, Resource Group, and Region. Create Azure OpenAI Resource
  4. Click Create new deployment. Configure the Model, Deployment Name, Scale Type, and Model Version. Global Batch Deployment

Now the OpenAI resource is ready for use in your LLM applications.

Creating Your Batch File (.jsonl)

Batch processing in Azure formatting our data in JSON lines (.jsonl). Each line represents an individual request.

Here is a basic example of a JSONL file used for batch requests:

{"custom_id": "task-0", "method": "POST", "url": "/chat/completions", "body": {"model": "REPLACE-WITH-MODEL-DEPLOYMENT-NAME", "messages": [{"role": "system", "content": "You are an AI assistant that helps people find information."}, {"role": "user", "content": "When was Microsoft founded?"}]}}
{"custom_id": "task-1", "method": "POST", "url": "/chat/completions", "body": {"model": "REPLACE-WITH-MODEL-DEPLOYMENT-NAME", "messages": [{"role": "system", "content": "You are an AI assistant that helps people find information."}, {"role": "user", "content": "When was the first XBOX released?"}]}}

Each request contains details such as the method, url, and body, which includes the specific model used and the information to be processed.

JSONL Batch File Parameters

FieldDescription
custom_idA unique identifier for each request.
methodThe HTTP method, usually "POST".
urlThe API endpoint for the request.
bodyThe body of the request, including model and messages with role and content.

Submitting Your Batch File

Once your batch file is ready, you can proceed to upload it to Azure AI Studio or upload it programmatically via APIs.

Python APIs

We can use the normal OpenAI library:

pip install openai azure --upgrade

The code first imports necessary tools and uses DefaultAzureCredential to handle signing in to Azure. Then it creates a client (AzureOpenAI) which lets us communicate with Azure services from the code. The code then opens a file called "test.jsonl" and sends it to Azure. This file contains tasks you want Azure to perform, in a batch.

import os
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)

client = AzureOpenAI(
  azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"), 
  azure_ad_token_provider=token_provider,
  api_version="2024-10-21"
)
# Upload a file with a purpose of "batch"

file = client.files.create(
  file=open("test.jsonl", "rb"), 
  purpose="batch"
)

print(file.model_dump_json(indent=2))
file_id = file.id

After the file has been uploaded successfully, we can submit it for batch processing.

# Submit a batch job with the file
batch_response = client.batches.create(
    input_file_id=file_id,
    endpoint="/chat/completions",
    completion_window="24h",
)

# Save batch ID for later use
batch_id = batch_response.id

print(batch_response.model_dump_json(indent=2))

Azure AI Studio

  1. Log in to AI Studio: Begin by signing into AI Studio.

  2. Access Batch Jobs: Navigate to your Azure OpenAI resource, locate the Batch jobs (TOOLS) section, and click on Create Batch Job.

Create Batch Job

  1. Upload the JSONL File: Under the Batch data section, click Upload file and choose your prepared JSONL file.

Upload JSONL File

After completing the upload, you will be able to monitor the progress of your batch jobs in AI Studio.

After uploading your batch file, click Create to start deploying the batch job.

Deployment

Azure will then validate and enqueue the requests, assigning them tokens and processing them asynchronously.

Monitor Batch Job progress

Once your batch job is underway, you can monitor its status in Azure AI Studio.

Azure provides detailed timestamps and status messages to help track each phase of the job.

If your job fails, error messages will appear to guide you through troubleshooting. You can also view the number of requests processed, those pending, and any failures that occurred.

When monitoring via code, It is recommended to wait atleast 60 seconds for each status call

import time
import datetime 

status = "validating"
while status not in ("completed", "failed", "canceled"):
    time.sleep(60)
    batch_response = client.batches.retrieve(batch_id)
    status = batch_response.status
    print(f"{datetime.datetime.now()} Batch Id: {batch_id},  Status: {status}")

if batch_response.status == "failed":
    for error in batch_response.errors.data:  
        print(f"Error code {error.code} Message {error.message}")

Possible status values are validating, failed, in_progress, finalizing, completed, expired, cancelling and cancelled.

For cancelling any Batch call, we can run:

client.batches.cancel("batch_abc123") # set to your batch_id for your job

Retrieving Batch Job Results

When a job is completed, Azure generates two types of files:

  1. Output File: Contains successfully executed requests and results.

  2. Error File: Details any issues or failures encountered during processing.

Successful jobs can still generate an error_file_id, but it will be associated with an empty file with zero bytes.

You can download these files for further review by clicking the appropriate download icon in the Batch Jobs interface.

Or programmatically, retrieve the file from the file.id that we stored earlier.

import json

output_file_id = batch_response.output_file_id

if not output_file_id:
    output_file_id = batch_response.error_file_id

if output_file_id:
    file_response = client.files.content(output_file_id)
    raw_responses = file_response.text.strip().split('\n')  

    for raw_response in raw_responses:  
        json_response = json.loads(raw_response)  
        formatted_json = json.dumps(json_response, indent=2)  
        print(formatted_json)

Troubleshoot: Common Issues

  • Invalid JSON Lines: If you get an error regarding invalid JSON, ensure there are no missing brackets or incorrect characters in your JSONL file. You can use a JSON validator tool to check your files before submitting.

  • Too Many Requests: The Batch API limits each file to 100,000 items. Make sure your batch file is within these limits to avoid submission errors.

If your dataset is higher than that, splitting into smaller files will help.

  • Authentication Errors: Ensure your Azure credentials are correct and that you have the necessary permissions to create and manage batch jobs.

  • URL Mismatch: The URL provided in the request does not match the expected endpoint URL.

  • Quota Exceeded: This means that your current deployment does not have enough tokens left to process the batch. You may need to adjust your quota or split the requests into smaller batches.


For detailed information on configuring, monitoring, and optimizing your Azure OpenAI deployments, refer to Azure's official documentation or explore Azure AI Studio's interface.

As the need for high-volume data processing continues to grow, Azure's Batch API stands for modern enterprises.

Summary

FeatureDetails
Maximum Requests per File100,000
Supported RegionsEast US, West US, Sweden Central
Dynamic QuotaEnabled for optimal resource utilization
Input File FormatJSON Lines (.jsonl)
Cost Reduction50% lower than standard global pricing
Supported Models for ImagesGPT-4o
Support for Embeddings ModelsNot Supported
Support for fine-tuned ModelsNot Supported
Content filteringSupported
Error File GenerationAvailable for troubleshooting

Building an AI chatbot?

Open-source GenAI monitoring, prompt management, and magic.

Learn More

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

Building an AI chatbot?

Open-source GenAI monitoring, prompt management, and magic.

Learn More