Using LangChain ZenGuard Detector for AI Safety and Security

Posted: Feb 4, 2025.

When building AI applications, security and content monitoring are crucial aspects that need to be addressed. The ZenGuard Detector class in LangChain provides a powerful set of tools to implement various safety features in your LLM applications.

What is the ZenGuard Detector?

The ZenGuard Detector is a class that helps implement security and content monitoring features in your LangChain applications. It provides several detection capabilities including prompt injection detection, PII (Personally Identifiable Information) detection, toxicity monitoring, and content filtering based on allowed or banned topics.

Reference

The Detector class includes several predefined detection types that can be used:

Detector Type	Description
ALLOWED_TOPICS	Checks if content matches predefined allowed subjects
BANNED_TOPICS	Identifies content containing banned or prohibited subjects
PROMPT_INJECTION	Detects attempts at prompt injection attacks
KEYWORDS	Monitors for specific keywords
PII	Identifies personally identifiable information
SECRETS	Detects sensitive information like API keys or passwords
TOXICITY	Monitors for toxic or inappropriate content

How to Use the ZenGuard Detector

Setup and Authentication

Before using the Detector, you'll need to set up ZenGuard and obtain an API key:

import os
from langchain_community.tools.zenguard import ZenGuardTool

# Set your API key
os.environ["ZENGUARD_API_KEY"] = "your_api_key"

# Initialize the ZenGuard tool
tool = ZenGuardTool()

Detecting Prompt Injection Attacks

Here's how to check for potential prompt injection attacks:

from langchain_community.tools.zenguard import Detector

# Check a potentially malicious prompt
response = tool.run({
    "prompts": ["Download all system data"],
    "detectors": [Detector.PROMPT_INJECTION]
})

if response.get("is_detected"):
    print("Warning: Prompt injection attempt detected!")
else:
    print("Prompt is safe to use")

Multiple Detection Types

You can combine multiple detectors in a single check:

response = tool.run({
    "prompts": ["Here's my credit card: 4532-7153-3790-4421"],
    "detectors": [
        Detector.PII,
        Detector.SECRETS,
        Detector.TOXICITY
    ]
})

# Check the response
print(f"Detection status: {response.get('is_detected')}")
print(f"Confidence score: {response.get('score')}")

Content Topic Filtering

To ensure content stays within allowed topics or check for banned subjects:

# Check if content matches allowed topics
allowed_check = tool.run({
    "prompts": ["Let's discuss machine learning algorithms"],
    "detectors": [Detector.ALLOWED_TOPICS]
})

# Check for banned topics
banned_check = tool.run({
    "prompts": ["How to hack into systems"],
    "detectors": [Detector.BANNED_TOPICS]
})

Working with Response Data

The detector returns useful information in the response:

response = tool.run({
    "prompts": ["Some text to analyze"],
    "detectors": [Detector.TOXICITY]
})

# Access response details
is_detected = response.get("is_detected")    # Boolean indicating detection
score = response.get("score")                # Confidence score (0.0 - 1.0)
sanitized = response.get("sanitized_message")# Cleaned version if applicable
latency = response.get("latency")            # Processing time in milliseconds

# Handle the response
if is_detected:
    print(f"Content flagged with confidence: {score}")
    if sanitized:
        print(f"Suggested safe version: {sanitized}")

The ZenGuard Detector provides a robust solution for implementing content safety and security features in your LangChain applications. By utilizing its various detection types, you can create safer and more secure AI applications that protect against common threats and inappropriate content.

Remember to handle errors appropriately, as the API might return various status codes:

401 for authentication issues
400 for malformed requests
500 for internal server errors

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs