Cloud Storage

Pixeltable supports storing media files (images, videos, audio, documents) in external cloud storage providers instead of local disk. This is essential for production deployments, enabling scalable storage, team collaboration, and integration with existing data infrastructure.

Supported Providers

Amazon S3

Native S3 storage with full feature support

Google Cloud Storage

GCS buckets with gs:// URI scheme

Azure Blob Storage

Azure containers with wasb:// or abfs:// schemes

Cloudflare R2

S3-compatible storage with zero egress fees

Backblaze B2

Cost-effective S3-compatible storage

Tigris

Globally distributed S3-compatible storage

How It Works

When you configure a storage destination, Pixeltable automatically:

Uploads computed media — AI-generated images, extracted video frames, and other computed media files are stored in your bucket
Copies input media — Optionally persists referenced media files for durability
Manages file lifecycle — Cleans up files when table data is deleted
Handles caching — Downloads files on-demand with intelligent local caching

Configuration

There are two ways to configure cloud storage destinations:

Global Default Destinations

Set default destinations for all media columns in your config.toml (see Configuration for details):

[pixeltable]
# For input media (inserted/referenced files)
input_media_dest = "s3://my-bucket/input/"

# For computed media (AI-generated outputs)  
output_media_dest = "s3://my-bucket/output/"

Or via environment variables:

export PIXELTABLE_INPUT_MEDIA_DEST="s3://my-bucket/input/"
export PIXELTABLE_OUTPUT_MEDIA_DEST="s3://my-bucket/output/"

Configure these before creating tables. All media columns will automatically use the configured destinations.

Per-Column Destination (Computed Columns Only)

For computed columns, you can override the default with a specific destination:

import pixeltable as pxt

# Create a table with input media column
# (uses global input_media_dest if configured)
t = pxt.create_table('my_app.images', {'image': pxt.Image})

# Add computed column with explicit destination
t.add_computed_column(
    thumbnail=t.image.resize((128, 128)),
    destination='s3://my-bucket/thumbnails/'
)

The destination parameter only applies to stored computed columns. For input columns, use the global input_media_dest configuration.

Precedence Rules

Destinations are resolved in this order:

Explicit column destination — highest priority (computed columns only)
Global default — input_media_dest for input columns, output_media_dest for computed columns
Local storage — fallback if no destination is configured

Provider Configuration

Amazon S3

URI Format
Authentication
Example

s3://bucket-name/optional/prefix/

Google Cloud Storage

URI Format
Authentication
Requirements
Example

gs://bucket-name/optional/prefix/

Azure Blob Storage

URI Formats
Authentication
Requirements
Example

Azure supports multiple URI schemes:

wasbs://container@account.blob.core.windows.net/prefix/
abfss://container@account.dfs.core.windows.net/prefix/

Cloudflare R2

URI Format
Authentication
Example

https://account-id.r2.cloudflarestorage.com/bucket-name/prefix/

Backblaze B2

URI Format
Authentication
Example

https://s3.region.backblazeb2.com/bucket-name/prefix/

Tigris

URI Format
Authentication
Example

https://fly.storage.tigris.dev/bucket-name/prefix/

Complete Example

Here’s a full example using S3 for both input and computed media. First, configure your global destinations in ~/.pixeltable/config.toml:

[pixeltable]
input_media_dest = "s3://my-app-bucket/uploads/"
output_media_dest = "s3://my-app-bucket/generated/"

s3_profile = "my-aws-profile"  # optional, uses default credentials if not set

Then create your table and add computed columns:

import pixeltable as pxt
from pixeltable.functions import openai

# Create a table — input media automatically goes to input_media_dest
t = pxt.create_table('production.photos', {'photo': pxt.Image})

# Add a computed column for thumbnails
# Uses output_media_dest by default, or specify a custom destination
t.add_computed_column(
    thumbnail=t.photo.resize((256, 256)),
    destination='s3://my-app-bucket/thumbnails/'  # override default
)

# Add AI-generated descriptions (uses output_media_dest)
t.add_computed_column(
    description=openai.vision(
        prompt="Describe this image briefly.",
        image=t.photo,
        model='gpt-4o-mini'
    )
)

# Insert data — Pixeltable handles all uploads automatically
t.insert([
    {'photo': 'https://example.com/image1.jpg'},
    {'photo': '/local/path/to/image2.png'},
])

# Query as usual — files are streamed/cached as needed
t.select(t.photo, t.thumbnail, t.description).collect()

Best Practices

Use prefixes to organize data

Structure your bucket with prefixes that reflect your application:

s3://my-bucket/
  ├── production/
  │   ├── uploads/
  │   └── generated/
  └── staging/
      ├── uploads/
      └── generated/

Separate input and output destinations

Use different prefixes or buckets for input vs computed media:

Easier to set different retention policies
Clearer cost attribution
Simpler backup strategies

Configure lifecycle policies

Set up bucket lifecycle policies to automatically:

Transition old data to cheaper storage tiers
Delete temporary/staging data after a period
Enable versioning for critical data

Use IAM roles in production

When running on cloud infrastructure, use IAM roles instead of access keys:

More secure (no key rotation needed)
Automatic credential refresh
Better audit trails

Troubleshooting

Access Denied errors

Verify your credentials have the necessary permissions:

s3:GetObject, s3:PutObject, s3:DeleteObject
s3:ListBucket for the bucket

For GCS: storage.objects.create, storage.objects.get, storage.objects.delete

Bucket not found

Ensure the bucket exists and the name is spelled correctly
Check the region matches your credential configuration
For S3-compatible providers, verify the endpoint URL is correct

Slow uploads

Pixeltable uses connection pooling and parallel uploads automatically
Consider using a bucket in the same region as your compute
Check your network bandwidth and latency

Configuration Reference

See the complete list of storage configuration options including profiles for S3, R2, B2, Tigris, and Azure.

Need help setting up cloud storage? Join our Discord community for support.

Built-in Integrations

Infrastructure

Bring Your Own

Supported Providers

Amazon S3

Google Cloud Storage

Azure Blob Storage

Cloudflare R2

Backblaze B2

Tigris

How It Works

Configuration

Global Default Destinations

Per-Column Destination (Computed Columns Only)

Precedence Rules

Provider Configuration

Amazon S3

Google Cloud Storage

Azure Blob Storage

Cloudflare R2

Backblaze B2

Tigris

Complete Example

Best Practices

Troubleshooting

Configuration Reference

Built-in Integrations

Infrastructure

Bring Your Own

​Supported Providers

Amazon S3

Google Cloud Storage

Azure Blob Storage

Cloudflare R2

Backblaze B2

Tigris

​How It Works

​Configuration

​Global Default Destinations

​Per-Column Destination (Computed Columns Only)

​Precedence Rules

​Provider Configuration

​Amazon S3

​Google Cloud Storage

​Azure Blob Storage

​Cloudflare R2

​Backblaze B2

​Tigris

​Complete Example

​Best Practices

​Troubleshooting

Configuration Reference

Supported Providers

How It Works

Configuration

Global Default Destinations

Per-Column Destination (Computed Columns Only)

Precedence Rules

Provider Configuration

Amazon S3

Google Cloud Storage

Azure Blob Storage

Cloudflare R2

Backblaze B2

Tigris

Complete Example

Best Practices

Troubleshooting