Data Sharing

This documentation page is also available as an interactive notebook. You can launch the notebook in Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the above links.

Learn how to publish datasets to Pixeltable Cloud and replicate datasets from the cloud to your local environment.

Overview

Pixeltable Cloud enables you to:

Publish your datasets for sharing with teams or the public
Replicate datasets from the cloud to your local environment
Share multimodal AI datasets (images, videos, audio, documents) without managing infrastructure

This guide demonstrates both publishing and replicating datasets.

Setup

Data sharing functionality requires Pixeltable version 0.4.24 or later.

%pip install -qU pixeltable

Replicating Datasets

You can replicate any public dataset from Pixeltable Cloud to your local environment without needing an account or API key.

Replicate a Public Dataset

Let’s replicate a mini-version of the COCO-2017 dataset from Pixeltable Cloud. You can find this dataset at pixeltable.com/t/pixeltable:fiftyone/coco_mini_2017, or browse for other public datasets. When calling replicate():

remote_uri (required): The URI of the cloud dataset you want to replicate
local_path (your choice): The local directory/table name where you want to store the replica
Variable name (your choice): The Python variable in your session/script to reference the table (e.g., coco_copy)

See the replicate() SDK reference for full documentation.

import pixeltable as pxt

# The remote_uri is the specific cloud dataset you want to replicate
# The local_path and variable name are yours to choose
coco_copy = pxt.replicate(
    remote_uri='pxt://pixeltable:fiftyone/coco_mini_2017',
    local_path='coco-copy'
)

You can check that the replica exists at the local path with list_tables().

pxt.list_tables()

Working with Replicas

Replicated datasets are read-only locally, but you can query, explore, and use them in powerful ways: 1. Query and explore the data

# View the replicated data
coco_copy.limit(3).collect()

2. Perform similarity searches Replicas include embedding indexes, so you can immediately perform similarity searches:

# Get a sample image to search with
sample_img = coco_copy.select(coco_copy.image).limit(1).collect()[0]['image']
sample_img

# Perform image-based similarity search
sim = coco_copy.image.similarity(sample_img)
results = (
    coco_copy
    .order_by(sim, asc=False)
    .limit(5)
    .select(coco_copy.image, sim)
    .collect()
)
results

Because the COCO dataset uses CLIP embeddings (which are multimodal), you can also search using text queries:

# Perform text-based similarity search
sim = coco_copy.image.similarity('surfing')
results = (
    coco_copy
    .order_by(sim, asc=False)
    .limit(4)
    .select(coco_copy.image, sim)
    .collect()
)
results

3. Access replicas in new sessions In a new Python session, use list_tables() and get_table() to access your replicas:

# List all tables to see your replica
pxt.list_tables()

# Assign a handle to the replica
coco_copy = pxt.get_table('coco-copy')

4. Create an independent copy To work with the data in new ways, create an independent table with the replica as the source:

# Create a fresh table with values only
my_coco = pxt.create_table('my-coco-table', source=coco_copy)

This copies the values in the source, but drops the computational definitions and cannot be updated if the source table changes.

Updating Replicas with Pull

If the upstream table changes, you can update your local replica using pull():

# Update your local replica with changes from the cloud
coco_copy.pull()

This synchronizes your local replica with any updates made to the source dataset.

Publishing Datasets

Requirements:

A Pixeltable Cloud account (Community Edition includes 1TB storage - see pricing)
Your API key from the account dashboard

Publishing allows you to share your datasets with your team or make them publicly available.

Configure Your API Key

Pixeltable looks for your API key in the PIXELTABLE_API_KEY environment variable. Choose one of these methods: Option 1: In your notebook (secure and convenient) Run this cell to securely enter your API key (get it from pixeltable.com/dashboard):

from getpass import getpass
import os

os.environ['PIXELTABLE_API_KEY'] = getpass('Pixeltable API Key:')

Option 2: Environment variable Add to your ~/.zshrc or ~/.bashrc:

export PIXELTABLE_API_KEY='your-api-key-here'

Option 3: Config file Add to ~/.pixeltable/config.toml:

[pixeltable]
api_key = 'your-api-key-here'

See the Configuration Guide for details.

Create a Sample Dataset

Let’s create a table with images from this repository to publish. The comment parameter provides a description that will be visible on Pixeltable Cloud:

# Create a fresh directory
pxt.drop_dir('sample-images', force=True)
pxt.create_dir('sample-images')

t = pxt.create_table(
    'sample-images.photos',
    schema={'image': pxt.Image, 'description': pxt.String},
    comment='Sample image dataset for demonstrating Pixeltable Cloud publishing'
)

base_url = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images'
t.insert([
    {'image': f'{base_url}/000000000009.jpg', 'description': 'Kitchen scene'},
    {'image': f'{base_url}/000000000025.jpg', 'description': 'Street view'},
    {'image': f'{base_url}/000000000042.jpg', 'description': 'Indoor setting'},
])

Publish Your Dataset

Publish your table to Pixeltable Cloud. When calling publish():

source (required): An existing local table - either a table path string (e.g., 'sample-images.photos') or table handle (e.g., t)
- If you use a local table path string, it must match a table in your local database (you can verify with pxt.list_tables())
destination_uri (required): The cloud URI where you want to publish, in the format pxt://orgname/dataset
- Pixeltable automatically creates any directory structure in the cloud based on this URI
- Your local directory structure doesn’t need to match the cloud structure

See the publish() SDK reference for full documentation.

# Option 1: Publish using table path (string)
pxt.publish(
    source='sample-images.photos',  # Table path from list_tables()
    destination_uri='pxt://your-orgname/sample-images'
)

# Option 2: Publish using table handle
# pxt.publish(
#     source=t,  # Table handle you assigned
#     destination_uri='pxt://your-orgname/sample-images'
# )

Understanding Destination URIs

The destination_uri in publish() uses the format: pxt://org:database/path URI components:

org (required): Your organization name
database (optional): Database name - defaults to main if omitted
path (required): Directory and table path in the cloud

Examples:

pxt://orgname/my-dataset → Uses the default main database
pxt://orgname:main/my-dataset → Explicitly specifies the main database
pxt://orgname:analytics/my-dataset → Uses the analytics database

About databases:

Every Pixeltable Cloud account includes a main database by default
Each database has its own storage bucket
You can create additional databases in your Pixeltable dashboard

Updating Published Datasets with Push

After you’ve published a dataset, you can update the cloud replica with local changes using push():

# Make some changes to your local table
t.insert([{'image': f'{base_url}/000000000049.jpg', 'description': 'Outdoor scene'}])

# Push the changes to your published dataset
t.push()

This updates the published dataset on Pixeltable Cloud with your local changes. Your dataset is now published and can be replicated by others using:

import pixeltable as pxt

sample_images = pxt.replicate(
    remote_uri='pxt://your-orgname/sample-images',
    local_path='sample-images-copy'
)

Note: If you are the owner of a published table, you cannot use replicate() to create a replica of your own table. This is because the table already exists in your Pixeltable database. The replicate() function is intended for pulling datasets published by others into your environment.

Access Control

The access parameter in publish() controls who can replicate your dataset:

access='private' (default): Only your team members can access the dataset
access='public': Anyone can replicate your dataset

You can set access control either at the time of publish using the access parameter, or change it later in the Pixeltable Cloud UI. You can also manage team members and permissions in your dashboard.

Deleting Published Tables

If you want to delete a published table, you have two options: Option 1: Using the Pixeltable SDK Use drop_table() with your table’s destination URI (the same pxt:// URI you used when publishing):

pxt.drop_table('pxt://your-orgname/sample-images')

Option 2: Using the Pixeltable Cloud dashboard Navigate to your Pixeltable Cloud dashboard and delete the table from the UI.

Get Help

Have questions or need support? Join our community:

Discord Community: Ask questions, get community support, and share what you build with Pixeltable
YouTube: Watch tutorials, demos, and feature walkthroughs
GitHub Issues: Report bugs or request features

Welcome to Pixeltable

Concepts

Notebooks

Overview

Setup

Replicating Datasets

Replicate a Public Dataset

Working with Replicas

Updating Replicas with Pull

Publishing Datasets

Configure Your API Key

Create a Sample Dataset

Publish Your Dataset

Understanding Destination URIs

Updating Published Datasets with Push

Access Control

Deleting Published Tables

Get Help

Resources

Welcome to Pixeltable

Concepts

Notebooks

​Overview

​Setup

​Replicating Datasets

​Replicate a Public Dataset

​Working with Replicas

​Updating Replicas with Pull

​Publishing Datasets

​Configure Your API Key

​Create a Sample Dataset

​Publish Your Dataset

​Understanding Destination URIs

​Updating Published Datasets with Push

​Access Control

​Deleting Published Tables

​Get Help

​Resources

Overview

Setup

Replicating Datasets

Replicate a Public Dataset

Working with Replicas

Updating Replicas with Pull

Publishing Datasets

Configure Your API Key

Create a Sample Dataset

Publish Your Dataset

Understanding Destination URIs

Updating Published Datasets with Push

Access Control

Deleting Published Tables

Get Help

Resources