Vector Database

Learn more about embedding/vector indexes with this in-depth guide.

What are Embedding/Vector Indexes?

Embedding indexes let you search your data based on meaning, not just keywords. They work with all kinds of content - text, images, audio, video, and documents - making it easy to build powerful search systems.

Multimodal Search Examples

Pixeltable makes it easy to build semantic search for different media types:

Audio

Build semantic search for audio files and podcasts

Document

Search through PDFs and other document formats

Video

Find relevant content within video libraries

Agent Memory

Use metadata to search for long term memory for AI agents

How Pixeltable Makes Embeddings Easy

No infrastructure headaches - embeddings are managed automatically
Works with any media type - text, images, audio, video, or documents
Updates automatically - when data changes, embeddings update too
Compatible with your favorite models - use Hugging Face, OpenAI, or your custom models

Phase 1: Setup Embeddings Model and Index

The setup phase defines your schema and creates embedding indexes.

Sentence Transformers
OpenAI
Custom

pip install pixeltable sentence-transformers

import pixeltable as pxt
from pixeltable.functions.huggingface import sentence_transformer

# Create a directory to organize data (optional)
pxt.drop_dir('knowledge_base', force=True)
pxt.create_dir('knowledge_base')

# Create table
docs = pxt.create_table(
    "knowledge_base.documents",
    {
        "content": pxt.String,
        "metadata": pxt.Json
    }
)

# Create embedding index
embed_model = sentence_transformer.using(
    model_id="intfloat/e5-large-v2"
)
docs.add_embedding_index(
    column='content',
    string_embed=embed_model
)

Supported Index Options

Similarity Metrics

# Available metrics:
docs.add_embedding_index(
    column='content',
    metric='cosine'  # Default
    # Other options:
    # metric='ip'    # Inner product
    # metric='l2'    # L2 distance
)

Index Configuration

# Optional parameters
docs.add_embedding_index(
    column='content',
    idx_name='custom_name',  # Optional name
    string_embed=embed_model,
    image_embed=img_model,   # For image columns
)

Phase 2: Insert

The insert phase populates your indexes with data. Pixeltable automatically computes embeddings and maintains index consistency.

# Single insertion
docs.insert([
    {
        "content": "Your document text here",
        "metadata": {"source": "web", "category": "tech"}
    }
])

# Batch insertion
docs.insert([
    {
        "content": "First document",
        "metadata": {"source": "pdf", "category": "science"}
    },
    {
        "content": "Second document",
        "metadata": {"source": "web", "category": "news"}
    }
])

# Image insertion
image_urls = [
    'https://example.com/image1.jpg',
    'https://example.com/image2.jpg'
]
images.insert({'image': url} for url in image_urls)

Large batch insertions are more efficient than multiple single insertions as they reduce the number of embedding computations.

Phase 3: Query

The query phase allows you to search your indexed content using the similarity() function.

Similarity Search
Use in a Computed Column

  sim = docs.content.similarity("what is the documentation")

  # Return top-k most similar documents
  results = (docs.order_by(sim, asc=False)
      .select(docs.content, docs.metadata, score=sim)
      .limit(10)
  )

  for i in results:
      print(f"Similarity: {i['score']:.3f}")
      print(f"Text: {i['content']}\n")

Direct Embedding Access

Pixeltable allows direct access to the raw embedding vectors through the .embedding() method. This feature lets you retrieve the actual vector representations that power similarity search.

# Access embeddings from a column with a single index
results = docs.select(
    docs.content,
    embedding=docs.content.embedding()
).limit(5)

# Access embeddings from a column with multiple indices
results = docs.select(
    docs.content,
    embedding=docs.content.embedding(idx='custom_idx_name')
).limit(5)

# Embeddings are returned as numpy arrays
import numpy as np
assert isinstance(results[0, 'embedding'], np.ndarray)

# You can also store embeddings in a computed column
docs.add_computed_column(
    embedding_copy=docs.content.embedding()
)

The .similarity() method cannot be used directly in computed columns
Embedding indices cannot be dropped if there are computed columns that depend on them
When a column has multiple embedding indices, you must specify which index to use with the idx parameter

Management Operations

Drop Index

# Drop by name
docs.drop_embedding_index(idx_name='e5_idx')

# Drop by column (if single index)
docs.drop_embedding_index(column='content')

Update Index

# Indexes auto-update on changes
docs.update({
    'content': docs.content + ' Updated!'
})

Best Practices

Cache embedding models in production UDFs
Use batching for better performance
Consider index size vs. search speed tradeoffs
Monitor embedding computation time

Additional Resources

SDK Reference

Complete API reference

Embedding Indexes

Learn more about embedding indexes

Model Hub

Connect with your favorite Hugging Face models

Welcome to Pixeltable

Concepts

Notebooks

What are Embedding/Vector Indexes?

Multimodal Search Examples

Audio

Document

Video

Agent Memory

How Pixeltable Makes Embeddings Easy

Phase 1: Setup Embeddings Model and Index

Supported Index Options

Similarity Metrics

Index Configuration

Phase 2: Insert

Phase 3: Query

Direct Embedding Access

Management Operations

Drop Index

Update Index

Best Practices

Additional Resources

SDK Reference

Embedding Indexes

Model Hub

Welcome to Pixeltable

Concepts

Notebooks

​What are Embedding/Vector Indexes?

​Multimodal Search Examples

Audio

Document

Video

Agent Memory

​How Pixeltable Makes Embeddings Easy

​Phase 1: Setup Embeddings Model and Index

​Supported Index Options

Similarity Metrics

Index Configuration

​Phase 2: Insert

​Phase 3: Query

​Direct Embedding Access

​Management Operations

Drop Index

Update Index

​Best Practices

​Additional Resources

SDK Reference

Embedding Indexes

Model Hub

What are Embedding/Vector Indexes?

Multimodal Search Examples

How Pixeltable Makes Embeddings Easy

Phase 1: Setup Embeddings Model and Index

Supported Index Options

Phase 2: Insert

Phase 3: Query

Direct Embedding Access

Management Operations

Best Practices

Additional Resources