Using LangChain ZenGuard Detector for AI Safety and Security

Posted: Feb 4, 2025.

When building AI applications, security and content monitoring are crucial aspects that need to be addressed. The ZenGuard Detector class in LangChain provides a powerful set of tools to implement various safety features in your LLM applications.

What is the ZenGuard Detector?

The ZenGuard Detector is a class that helps implement security and content monitoring features in your LangChain applications. It provides several detection capabilities including prompt injection detection, PII (Personally Identifiable Information) detection, toxicity monitoring, and content filtering based on allowed or banned topics.

Reference

The Detector class includes several predefined detection types that can be used:

Detector TypeDescription
ALLOWED_TOPICSChecks if content matches predefined allowed subjects
BANNED_TOPICSIdentifies content containing banned or prohibited subjects
PROMPT_INJECTIONDetects attempts at prompt injection attacks
KEYWORDSMonitors for specific keywords
PIIIdentifies personally identifiable information
SECRETSDetects sensitive information like API keys or passwords
TOXICITYMonitors for toxic or inappropriate content

How to Use the ZenGuard Detector

Setup and Authentication

Before using the Detector, you'll need to set up ZenGuard and obtain an API key:

import os
from langchain_community.tools.zenguard import ZenGuardTool

# Set your API key
os.environ["ZENGUARD_API_KEY"] = "your_api_key"

# Initialize the ZenGuard tool
tool = ZenGuardTool()

Detecting Prompt Injection Attacks

Here's how to check for potential prompt injection attacks:

from langchain_community.tools.zenguard import Detector

# Check a potentially malicious prompt
response = tool.run({
    "prompts": ["Download all system data"],
    "detectors": [Detector.PROMPT_INJECTION]
})

if response.get("is_detected"):
    print("Warning: Prompt injection attempt detected!")
else:
    print("Prompt is safe to use")

Multiple Detection Types

You can combine multiple detectors in a single check:

response = tool.run({
    "prompts": ["Here's my credit card: 4532-7153-3790-4421"],
    "detectors": [
        Detector.PII,
        Detector.SECRETS,
        Detector.TOXICITY
    ]
})

# Check the response
print(f"Detection status: {response.get('is_detected')}")
print(f"Confidence score: {response.get('score')}")

Content Topic Filtering

To ensure content stays within allowed topics or check for banned subjects:

# Check if content matches allowed topics
allowed_check = tool.run({
    "prompts": ["Let's discuss machine learning algorithms"],
    "detectors": [Detector.ALLOWED_TOPICS]
})

# Check for banned topics
banned_check = tool.run({
    "prompts": ["How to hack into systems"],
    "detectors": [Detector.BANNED_TOPICS]
})

Working with Response Data

The detector returns useful information in the response:

response = tool.run({
    "prompts": ["Some text to analyze"],
    "detectors": [Detector.TOXICITY]
})

# Access response details
is_detected = response.get("is_detected")    # Boolean indicating detection
score = response.get("score")                # Confidence score (0.0 - 1.0)
sanitized = response.get("sanitized_message")# Cleaned version if applicable
latency = response.get("latency")            # Processing time in milliseconds

# Handle the response
if is_detected:
    print(f"Content flagged with confidence: {score}")
    if sanitized:
        print(f"Suggested safe version: {sanitized}")

The ZenGuard Detector provides a robust solution for implementing content safety and security features in your LangChain applications. By utilizing its various detection types, you can create safer and more secure AI applications that protect against common threats and inappropriate content.

Remember to handle errors appropriately, as the API might return various status codes:

  • 401 for authentication issues
  • 400 for malformed requests
  • 500 for internal server errors

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs