Using Oracle AI Vector Search in LangChain

Posted: Nov 18, 2024.

Oracle AI Vector Search is a powerful vector store implementation in LangChain that allows you to perform semantic search on documents using Oracle Database's vector search capabilities. It combines the power of vector embeddings with Oracle's enterprise features like clustering, partitioning, and security.

What is OracleVS?

OracleVS is LangChain's vector store implementation that uses Oracle AI Vector Search under the hood. It allows you to:

  • Store document embeddings in Oracle Database tables
  • Perform similarity searches using different distance metrics (dot product, cosine, euclidean)
  • Create HNSW and IVF indices for fast approximate nearest neighbor search
  • Filter searches based on document metadata
  • Combine semantic search with Oracle's relational database features

The main advantage is that you can keep your vector embeddings alongside your relational data in Oracle Database, avoiding data fragmentation across multiple systems.

Reference

Here are the key methods available in OracleVS:

MethodDescription
__init__Initialize vector store with Oracle connection, embeddings function, table name and search params
from_documentsCreate vector store from a list of documents
from_textsCreate vector store from a list of texts
add_textsAdd new texts to the vector store
deleteDelete documents by their IDs
similarity_searchFind similar documents using vector similarity
similarity_search_with_scoreFind similar documents and return relevance scores
max_marginal_relevance_searchSearch optimizing for similarity and diversity

How to Use OracleVS

Setting Up the Connection

First, you'll need to connect to your Oracle database:

import oracledb

connection = oracledb.connect(
    user="username",
    password="password", 
    dsn="host:port/service_name"
)

Creating a Vector Store

You can create a vector store from documents using an embedding model:

from langchain_community.vectorstores import OracleVS
from langchain_community.embeddings import OpenAIEmbeddings
from langchain_core.documents import Document

# Initialize embeddings
embeddings = OpenAIEmbeddings()

# Create documents
documents = [
    Document(page_content="Document 1 content", metadata={"id": "1"}),
    Document(page_content="Document 2 content", metadata={"id": "2"})
]

# Create vector store
vector_store = OracleVS.from_documents(
    documents,
    embeddings,
    client=connection,
    table_name="my_vectors",
)

You can search for similar documents in different ways:

# Basic similarity search
results = vector_store.similarity_search(
    "search query",
    k=2  # Return top 2 results
)

# Search with scores
results = vector_store.similarity_search_with_score(
    "search query",
    k=2
)

# Search with metadata filtering
filter_criteria = {"source": "book"}
results = vector_store.similarity_search(
    "search query",
    filter=filter_criteria
)

# Maximum marginal relevance search for diversity
results = vector_store.max_marginal_relevance_search(
    "search query",
    k=2,
    fetch_k=20,
    lambda_mult=0.5  # Diversity factor
)

Creating Search Indices

Oracle AI Vector Search supports two types of indices - HNSW and IVF:

from langchain_community.vectorstores import oraclevs

# Create HNSW index with custom parameters
oraclevs.create_index(
    connection,
    vector_store,
    params={
        "idx_name": "my_hnsw_index",
        "idx_type": "HNSW",
        "accuracy": 97,
        "parallel": 16
    }
)

# Create IVF index
oraclevs.create_index(
    connection,
    vector_store,
    params={
        "idx_name": "my_ivf_index", 
        "idx_type": "IVF",
        "neighbor_part": 64
    }
)

Adding and Removing Documents

You can dynamically add or remove documents:

# Add new texts
texts = ["New document 1", "New document 2"]
metadata = [{"id": "3"}, {"id": "4"}]
vector_store.add_texts(texts, metadata)

# Delete documents by ID
vector_store.delete(["3", "4"])

By leveraging Oracle AI Vector Search through the OracleVS vector store, you get the benefits of semantic search combined with Oracle's enterprise database features. This makes it ideal for production applications that need to maintain both structured and unstructured data in a single system.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs

Join 10,000+ subscribers

Every 2 weeks, latest model releases and industry news.

An alternative to LangSmith

Open-source LangChain monitoring, prompt management, and magic. Get started in 2 minutes.

LangChain Docs