Using ImagePromptTemplate in LangChain for Multimodal Models

Posted: Feb 3, 2025.

LangChain's ImagePromptTemplate allows you to create prompts that include image inputs for multimodal language models. Let's explore how to use this class effectively.

What is ImagePromptTemplate?

ImagePromptTemplate is a specialized prompt template class designed for working with multimodal models that can process both text and images. It helps format prompts that include image URLs or paths, making it easier to interact with vision-language models.

Reference

Here are the key parameters for ImagePromptTemplate:

Parameter	Type	Description
input_variables	List[str]	Required list of input variable names for the prompt
template	Dict	Template for the prompt including image information
template_format	str	Format of the prompt template ('f-string', 'mustache', or 'jinja2'). Defaults to 'f-string'
partial_variables	Dict[str, Any]	Optional variables to partially fill the template
input_types	Dict[str, Any]	Types of variables the prompt template expects

How to Use ImagePromptTemplate

Let's look at different ways to use ImagePromptTemplate:

Basic Usage

Here's how to create a simple image prompt template:

from langchain_core.prompts import ImagePromptTemplate

# Create an image prompt template
template = ImagePromptTemplate(
    input_variables=["image_url", "question"],
    template={
        "image": "{image_url}",
        "text": "Look at the image and answer this question: {question}"
    }
)

# Format the prompt
prompt = template.format(
    image_url="https://example.com/image.jpg",
    question="What objects do you see in this image?"
)

Using with Vision Models

Here's an example of using ImagePromptTemplate with a vision model:

from langchain_core.prompts import ImagePromptTemplate
from langchain_openai import ChatOpenAI

# Create the image prompt template
image_prompt = ImagePromptTemplate(
    input_variables=["image_url", "task"],
    template={
        "image": "{image_url}",
        "text": "{task}"
    }
)

# Set up the vision model
model = ChatOpenAI(model="gpt-4-vision-preview")

# Create a chain
chain = image_prompt | model

# Use the chain
response = chain.invoke({
    "image_url": "https://example.com/cat.jpg",
    "task": "Describe what you see in this image in detail."
})

Using with Multiple Images

You can also create prompts that handle multiple images:

from langchain_core.prompts import ImagePromptTemplate

# Create a template for comparing images
compare_template = ImagePromptTemplate(
    input_variables=["image_url_1", "image_url_2"],
    template={
        "image": ["{image_url_1}", "{image_url_2}"],
        "text": "Compare these two images and describe the differences."
    }
)

# Format the prompt
comparison_prompt = compare_template.format(
    image_url_1="https://example.com/image1.jpg",
    image_url_2="https://example.com/image2.jpg"
)

Using Partial Variables

You can use partial variables to pre-fill some template values:

from langchain_core.prompts import ImagePromptTemplate

# Create a template with a fixed image URL
fixed_image_template = ImagePromptTemplate(
    input_variables=["question"],
    partial_variables={
        "image_url": "https://example.com/fixed_image.jpg"
    },
    template={
        "image": "{image_url}",
        "text": "{question}"
    }
)

# Now you only need to provide the question
prompt = fixed_image_template.format(
    question="What colors are present in this image?"
)

Error Handling

When working with ImagePromptTemplate, make sure to handle potential errors:

from langchain_core.prompts import ImagePromptTemplate

def create_image_prompt(image_url: str, question: str) -> str:
    try:
        template = ImagePromptTemplate(
            input_variables=["image_url", "question"],
            template={
                "image": "{image_url}",
                "text": "{question}"
            }
        )
        return template.format(
            image_url=image_url,
            question=question
        )
    except ValueError as e:
        print(f"Error creating prompt: {e}")
        return None

ImagePromptTemplate is a powerful tool for working with multimodal models in LangChain. By understanding its features and proper usage, you can effectively create prompts that combine images and text for various vision-language tasks.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs