Skip to main content
Pixeltable supports storing media files (images, videos, audio, documents) in external cloud storage providers instead of local disk. This is essential for production deployments, enabling scalable storage, team collaboration, and integration with existing data infrastructure.

Supported Providers

Amazon S3

Native S3 storage with full feature support

Google Cloud Storage

GCS buckets with gs:// URI scheme

Azure Blob Storage

Azure containers with wasb:// or abfs:// schemes

Cloudflare R2

S3-compatible storage with zero egress fees

Backblaze B2

Cost-effective S3-compatible storage

Tigris

Globally distributed S3-compatible storage

How It Works

When you configure a storage destination, Pixeltable automatically:
  1. Uploads computed media — AI-generated images, extracted video frames, and other computed media files are stored in your bucket
  2. Copies input media — Optionally persists referenced media files for durability
  3. Manages file lifecycle — Cleans up files when table data is deleted
  4. Handles caching — Downloads files on-demand with intelligent local caching

Configuration

There are two ways to configure cloud storage destinations:

Global Default Destinations

Set default destinations for all media columns in your config.toml (see Configuration for details):
[pixeltable]
# For input media (inserted/referenced files)
input_media_dest = "s3://my-bucket/input/"

# For computed media (AI-generated outputs)  
output_media_dest = "s3://my-bucket/output/"
Or via environment variables:
export PIXELTABLE_INPUT_MEDIA_DEST="s3://my-bucket/input/"
export PIXELTABLE_OUTPUT_MEDIA_DEST="s3://my-bucket/output/"
Configure these before creating tables. All media columns will automatically use the configured destinations.

Per-Column Destination (Computed Columns Only)

For computed columns, you can override the default with a specific destination:
import pixeltable as pxt

# Create a table with input media column
# (uses global input_media_dest if configured)
t = pxt.create_table('my_app.images', {'image': pxt.Image})

# Add computed column with explicit destination
t.add_computed_column(
    thumbnail=t.image.resize((128, 128)),
    destination='s3://my-bucket/thumbnails/'
)
The destination parameter only applies to stored computed columns. For input columns, use the global input_media_dest configuration.

Precedence Rules

Destinations are resolved in this order:
  1. Explicit column destination — highest priority (computed columns only)
  2. Global defaultinput_media_dest for input columns, output_media_dest for computed columns
  3. Local storage — fallback if no destination is configured

Provider Configuration

Amazon S3

  • URI Format
  • Authentication
  • Example
s3://bucket-name/optional/prefix/

Google Cloud Storage

  • URI Format
  • Authentication
  • Requirements
  • Example
gs://bucket-name/optional/prefix/

Azure Blob Storage

  • URI Formats
  • Authentication
  • Requirements
  • Example
Azure supports multiple URI schemes:
wasbs://container@account.blob.core.windows.net/prefix/
abfss://container@account.dfs.core.windows.net/prefix/

Cloudflare R2

  • URI Format
  • Authentication
  • Example
https://account-id.r2.cloudflarestorage.com/bucket-name/prefix/

Backblaze B2

  • URI Format
  • Authentication
  • Example
https://s3.region.backblazeb2.com/bucket-name/prefix/

Tigris

  • URI Format
  • Authentication
  • Example
https://fly.storage.tigris.dev/bucket-name/prefix/

Complete Example

Here’s a full example using S3 for both input and computed media. First, configure your global destinations in ~/.pixeltable/config.toml:
[pixeltable]
input_media_dest = "s3://my-app-bucket/uploads/"
output_media_dest = "s3://my-app-bucket/generated/"

s3_profile = "my-aws-profile"  # optional, uses default credentials if not set
Then create your table and add computed columns:
import pixeltable as pxt
from pixeltable.functions import openai

# Create a table — input media automatically goes to input_media_dest
t = pxt.create_table('production.photos', {'photo': pxt.Image})

# Add a computed column for thumbnails
# Uses output_media_dest by default, or specify a custom destination
t.add_computed_column(
    thumbnail=t.photo.resize((256, 256)),
    destination='s3://my-app-bucket/thumbnails/'  # override default
)

# Add AI-generated descriptions (uses output_media_dest)
t.add_computed_column(
    description=openai.vision(
        prompt="Describe this image briefly.",
        image=t.photo,
        model='gpt-4o-mini'
    )
)

# Insert data — Pixeltable handles all uploads automatically
t.insert([
    {'photo': 'https://example.com/image1.jpg'},
    {'photo': '/local/path/to/image2.png'},
])

# Query as usual — files are streamed/cached as needed
t.select(t.photo, t.thumbnail, t.description).collect()

Best Practices

Structure your bucket with prefixes that reflect your application:
s3://my-bucket/
  ├── production/
  │   ├── uploads/
  │   └── generated/
  └── staging/
      ├── uploads/
      └── generated/
Use different prefixes or buckets for input vs computed media:
  • Easier to set different retention policies
  • Clearer cost attribution
  • Simpler backup strategies
Set up bucket lifecycle policies to automatically:
  • Transition old data to cheaper storage tiers
  • Delete temporary/staging data after a period
  • Enable versioning for critical data
When running on cloud infrastructure, use IAM roles instead of access keys:
  • More secure (no key rotation needed)
  • Automatic credential refresh
  • Better audit trails

Troubleshooting

Verify your credentials have the necessary permissions:
  • s3:GetObject, s3:PutObject, s3:DeleteObject
  • s3:ListBucket for the bucket
For GCS: storage.objects.create, storage.objects.get, storage.objects.delete
  • Ensure the bucket exists and the name is spelled correctly
  • Check the region matches your credential configuration
  • For S3-compatible providers, verify the endpoint URL is correct
  • Pixeltable uses connection pooling and parallel uploads automatically
  • Consider using a bucket in the same region as your compute
  • Check your network bandwidth and latency

Configuration Reference

See the complete list of storage configuration options including profiles for S3, R2, B2, Tigris, and Azure.
Need help setting up cloud storage? Join our Discord community for support.