LangChain Structured Query Output Parser Guide

Posted: Feb 2, 2025.

The StructuredQueryOutputParser is a crucial component in LangChain's self-querying retrieval systems. It helps convert natural language queries into structured formats that can be used to filter and retrieve documents based on their content and metadata.

What is StructuredQueryOutputParser?

StructuredQueryOutputParser is a class that parses output from language models into a StructuredQuery format. This structured format allows for both semantic similarity search through document content and precise filtering based on document metadata. It's particularly useful in building self-querying retrieval systems where users can naturally express both what they want to search for and any constraints on metadata fields.

Reference

Here are the key methods of StructuredQueryOutputParser:

MethodDescription
from_components()Creates a parser instance with optional parameters for allowed comparators, operators, and attributes
parse()Parses a string output from a language model into a StructuredQuery object
get_format_instructions()Returns instructions on how the LLM output should be formatted

How to Use StructuredQueryOutputParser

Let's look at different ways to use this parser:

1. Basic Setup

First, you'll need to create a parser instance:

from langchain.chains.query_constructor.base import StructuredQueryOutputParser

parser = StructuredQueryOutputParser.from_components()

2. Creating a Parser with Specific Components

You can create a parser with specific allowed components:

from langchain.chains.query_constructor.base import (
    StructuredQueryOutputParser,
    Comparator,
    Operator
)

# Define allowed components
allowed_comparators = [Comparator.EQ, Comparator.GT, Comparator.LT]
allowed_operators = [Operator.AND, Operator.OR]
allowed_attributes = ["year", "rating", "genre"]

parser = StructuredQueryOutputParser.from_components(
    allowed_comparators=allowed_comparators,
    allowed_operators=allowed_operators,
    allowed_attributes=allowed_attributes
)

3. Using with Query Construction Chain

The parser is typically used as part of a query construction chain:

from langchain.chains.query_constructor.base import get_query_constructor_prompt
from langchain_openai import ChatOpenAI

# Define metadata fields
metadata_field_info = [
    AttributeInfo(
        name="genre",
        description="The genre of the movie",
        type="string"
    ),
    AttributeInfo(
        name="year",
        description="Release year of the movie",
        type="integer"
    )
]

# Create prompt
prompt = get_query_constructor_prompt(
    document_content_description="Movie information",
    metadata_field_info=metadata_field_info
)

# Create chain
llm = ChatOpenAI(temperature=0)
query_constructor = prompt | llm | parser

# Use the chain
structured_query = query_constructor.invoke(
    "Find action movies from after 2010"
)

4. Parsing Query Results

The parser will convert the LLM output into a structured format:

text_output = """
{
    "query": "action movies",
    "filter": "gt(\\"year\\", 2010)"
}
"""

structured_query = parser.parse(text_output)
# Returns a StructuredQuery object with query and filter components

5. Integrating with Retrieval Systems

The parser is commonly used in self-querying retrieval systems:

from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain_community.vectorstores import Chroma

# Create vectorstore
vectorstore = Chroma()

# Create retriever
retriever = SelfQueryRetriever.from_llm(
    llm=llm,
    vectorstore=vectorstore,
    document_content_description="Movie information",
    metadata_field_info=metadata_field_info,
    structured_query_translator=your_translator
)

6. Error Handling

You can enable the fix_invalid parameter to handle invalid queries:

parser = StructuredQueryOutputParser.from_components(
    fix_invalid=True
)

# If the LLM output is malformed, the parser will attempt to fix it
# rather than raising an error

The StructuredQueryOutputParser is a powerful tool for building natural language interfaces to document retrieval systems. By converting natural language queries into structured formats, it enables both semantic search and precise metadata filtering in an intuitive way.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs