This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Learn how to publish datasets to Pixeltable Cloud and replicate datasets
from the cloud to your local environment.
Overview
Pixeltable Cloud enables you to:
- Publish your datasets for sharing with teams or the public
- Replicate datasets from the cloud to your local environment
- Share multimodal AI datasets (images, videos, audio, documents)
without managing infrastructure
This guide demonstrates both publishing and replicating datasets.
Setup
Data sharing functionality requires Pixeltable version 0.4.24 or later.
%pip install -qU pixeltable
Replicating Datasets
You can replicate any public dataset from Pixeltable Cloud to your local
environment without needing an account or API key.
Replicate a Public Dataset
Let’s replicate a mini-version of the COCO-2017 dataset from Pixeltable
Cloud. You can find this dataset at
pixeltable.com/t/pixeltable:fiftyone/coco_mini_2017,
or browse for other public
datasets.
When calling replicate():
remote_uri (required): The URI of the cloud dataset you want
to replicate
local_path (your choice): The local directory/table name where
you want to store the replica
- Variable name (your choice): The Python variable in your
session/script to reference the table (e.g.,
coco_copy)
See the replicate() SDK
reference
for full documentation.
import pixeltable as pxt
# The remote_uri is the specific cloud dataset you want to replicate
# The local_path and variable name are yours to choose
coco_copy = pxt.replicate(
remote_uri='pxt://pixeltable:fiftyone/coco_mini_2017',
local_path='coco-copy'
)
You can check that the replica exists at the local path with
list_tables().
Working with Replicas
Replicated datasets are read-only locally, but you can query, explore,
and use them in powerful ways:
1. Query and explore the data
# View the replicated data
coco_copy.limit(3).collect()
2. Perform similarity searches
Replicas include embedding indexes, so you can immediately perform
similarity searches:
# Get a sample image to search with
sample_img = coco_copy.select(coco_copy.image).limit(1).collect()[0]['image']
sample_img
# Perform image-based similarity search
sim = coco_copy.image.similarity(sample_img)
results = (
coco_copy
.order_by(sim, asc=False)
.limit(5)
.select(coco_copy.image, sim)
.collect()
)
results
Because the COCO dataset uses CLIP embeddings (which are multimodal),
you can also search using text queries:
# Perform text-based similarity search
sim = coco_copy.image.similarity('surfing')
results = (
coco_copy
.order_by(sim, asc=False)
.limit(4)
.select(coco_copy.image, sim)
.collect()
)
results
3. Access replicas in new sessions
In a new Python session, use list_tables() and get_table() to access
your replicas:
# List all tables to see your replica
pxt.list_tables()
# Assign a handle to the replica
coco_copy = pxt.get_table('coco-copy')
4. Create an independent copy
To work with the data in new ways, create an independent table with the
replica as the source:
# Create a fresh table with values only
my_coco = pxt.create_table('my-coco-table', source=coco_copy)
This copies the values in the source, but drops the computational
definitions and cannot be updated if the source table changes.
Updating Replicas with Pull
If the upstream table changes, you can update your local replica using
pull():
# Update your local replica with changes from the cloud
coco_copy.pull()
This synchronizes your local replica with any updates made to the source
dataset.
Publishing Datasets
Requirements:
- A Pixeltable Cloud account (Community Edition includes 1TB storage -
see pricing)
- Your API key from the account
dashboard
Publishing allows you to share your datasets with your team or make them
publicly available.
Pixeltable looks for your API key in the PIXELTABLE_API_KEY
environment variable. Choose one of these methods:
Option 1: In your notebook (secure and convenient)
Run this cell to securely enter your API key (get it from
pixeltable.com/dashboard):
from getpass import getpass
import os
os.environ['PIXELTABLE_API_KEY'] = getpass('Pixeltable API Key:')
Option 2: Environment variable
Add to your ~/.zshrc or ~/.bashrc:
export PIXELTABLE_API_KEY='your-api-key-here'
Option 3: Config file
Add to ~/.pixeltable/config.toml:
[pixeltable]
api_key = 'your-api-key-here'
See the Configuration
Guide for details.
Create a Sample Dataset
Let’s create a table with images from this repository to publish. The
comment parameter provides a description that will be visible on
Pixeltable Cloud:
# Create a fresh directory
pxt.drop_dir('sample-images', force=True)
pxt.create_dir('sample-images')
t = pxt.create_table(
'sample-images.photos',
schema={'image': pxt.Image, 'description': pxt.String},
comment='Sample image dataset for demonstrating Pixeltable Cloud publishing'
)
base_url = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images'
t.insert([
{'image': f'{base_url}/000000000009.jpg', 'description': 'Kitchen scene'},
{'image': f'{base_url}/000000000025.jpg', 'description': 'Street view'},
{'image': f'{base_url}/000000000042.jpg', 'description': 'Indoor setting'},
])
Publish Your Dataset
Publish your table to Pixeltable Cloud. When calling publish():
source (required): An existing local table - either a table
path string (e.g., 'sample-images.photos') or table handle (e.g.,
t)
- If you use a local table path string, it must match a table in
your local database (you can verify with
pxt.list_tables())
destination_uri (required): The cloud URI where you want to
publish, in the format pxt://orgname/dataset
- Pixeltable automatically creates any directory structure in the
cloud based on this URI
- Your local directory structure doesn’t need to match the cloud
structure
See the publish() SDK
reference
for full documentation.
# Option 1: Publish using table path (string)
pxt.publish(
source='sample-images.photos', # Table path from list_tables()
destination_uri='pxt://your-orgname/sample-images'
)
# Option 2: Publish using table handle
# pxt.publish(
# source=t, # Table handle you assigned
# destination_uri='pxt://your-orgname/sample-images'
# )
Understanding Destination URIs
The destination_uri in publish() uses the format:
pxt://org:database/path
URI components:
org (required): Your organization name
database (optional): Database name - defaults to main if
omitted
path (required): Directory and table path in the cloud
Examples:
pxt://orgname/my-dataset → Uses the default main database
pxt://orgname:main/my-dataset → Explicitly specifies the main
database
pxt://orgname:analytics/my-dataset → Uses the analytics database
About databases:
- Every Pixeltable Cloud account includes a
main database by default
- Each database has its own storage bucket
- You can create additional databases in your Pixeltable
dashboard
Updating Published Datasets with Push
After you’ve published a dataset, you can update the cloud replica with
local changes using push():
# Make some changes to your local table
t.insert([{'image': f'{base_url}/000000000049.jpg', 'description': 'Outdoor scene'}])
# Push the changes to your published dataset
t.push()
This updates the published dataset on Pixeltable Cloud with your local
changes.
Your dataset is now published and can be replicated by others using:
import pixeltable as pxt
sample_images = pxt.replicate(
remote_uri='pxt://your-orgname/sample-images',
local_path='sample-images-copy'
)
Note: If you are the owner of a published table, you cannot use
replicate() to create a replica of your own table. This is because the
table already exists in your Pixeltable database. The replicate()
function is intended for pulling datasets published by others into your
environment.
Access Control
The access parameter in publish() controls who can replicate your
dataset:
access='private' (default): Only your team members can access
the dataset
access='public': Anyone can replicate your dataset
You can set access control either at the time of publish using the
access parameter, or change it later in the Pixeltable Cloud
UI. You can also manage team members
and permissions in your dashboard.
Deleting Published Tables
If you want to delete a published table, you have two options:
Option 1: Using the Pixeltable SDK
Use drop_table() with your table’s destination URI (the same pxt://
URI you used when publishing):
pxt.drop_table('pxt://your-orgname/sample-images')
Option 2: Using the Pixeltable Cloud dashboard
Navigate to your Pixeltable Cloud
dashboard and delete the table from
the UI.
Get Help
Have questions or need support? Join our community:
- Discord Community: Ask
questions, get community support, and share what you build with
Pixeltable
- YouTube: Watch
tutorials, demos, and feature walkthroughs
- GitHub
Issues: Report
bugs or request features
Resources