Using LangChain CubeSemanticLoader for Data Model Metadata

Posted: Feb 16, 2025.

The CubeSemanticLoader is a powerful tool that helps you load and process metadata from Cube's semantic layer, making it easier to work with data models in LangChain applications.

What is CubeSemanticLoader?

CubeSemanticLoader is a document loader designed to interact with Cube's semantic layer. It retrieves metadata about your data model, including table structures, column information, and dimension values. This metadata can be used to provide context to LLMs, helping them better understand your data structure and generate more accurate queries.

Cube's semantic layer abstracts complex database operations into business-level terminology, which makes it easier for LLMs to work with your data without dealing with complex SQL joins or metric calculations directly.

Reference

MethodDescription
load()Loads data model metadata into Document objects
lazy_load()Makes a call to Cube's REST API metadata endpoint and returns an iterator of Document objects
aload()Asynchronously loads data into Document objects
alazy_load()Asynchronously loads Documents as an iterator
load_and_split()Loads Documents and splits them into chunks

How to Use CubeSemanticLoader

Basic Setup

First, you'll need to set up your Cube API credentials:

from langchain_community.document_loaders import CubeSemanticLoader
import jwt

# Configure your Cube API settings
api_url = "https://your-cube-instance.com/cubejs-api/v1/meta"
cubejs_api_secret = "your-api-secret"

# Generate JWT token
security_context = {}  # Add security context if needed
api_token = jwt.encode(security_context, cubejs_api_secret, algorithm="HS256")

# Initialize the loader
loader = CubeSemanticLoader(
    cube_api_url=api_url,
    cube_api_token=api_token
)

Loading Documents

The simplest way to load documents is using the load() method:

# Load all documents
documents = loader.load()

# Example of accessing a document's content
first_doc = documents[0]
print("Content:", first_doc.page_content)
print("Metadata:", first_doc.metadata)

Customizing Dimension Values Loading

You can customize how dimension values are loaded:

loader = CubeSemanticLoader(
    cube_api_url=api_url,
    cube_api_token=api_token,
    load_dimension_values=True,  # Enable loading dimension values
    dimension_values_limit=5000,  # Set maximum values to load
    dimension_values_max_retries=5,  # Set retry attempts
    dimension_values_retry_delay=2  # Set delay between retries
)

Async Loading

For better performance in async applications:

async def load_cube_metadata():
    documents = await loader.aload()
    return documents

# Or use lazy loading for memory efficiency
async def stream_cube_metadata():
    async for doc in loader.alazy_load():
        # Process each document as it's loaded
        process_document(doc)

Working with Document Structure

Each document returned by the loader contains:

# Document structure example
document = {
    "page_content": "Column title and description",
    "metadata": {
        "table_name": "users_view",
        "column_name": "users_view.city",
        "column_data_type": "string",
        "column_title": "Users View City",
        "column_description": "City of the user",
        "column_member_type": "dimension",
        "column_values": ["New York", "San Francisco", "Chicago"],
        "cube_data_obj_type": "view"
    }
}

This structured format makes it easy to use the metadata for various LangChain applications, such as creating embeddings or providing context to LLMs for data analysis tasks.

The CubeSemanticLoader is particularly useful when you need to:

  • Build natural language interfaces for data analysis
  • Create context-aware data applications
  • Generate automated documentation for your data models
  • Provide semantic understanding of your data structure to LLMs

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs