Tables are the fundamental data storage units in Pixeltable. They function similarly to SQL database tables but with enhanced capabilities designed specifically for AI and ML workflows. Each table consists of columns with defined data types and can store both structured data and unstructured media assets.In Pixeltable, tables:
Persist across sessions, meaning your data remains available even after restarting your environment
Maintain strong typing for data consistency
Support operations like filtering, querying, and transformation
Can handle specialized data types for machine learning and media processing
Group logically into directories (namespaces) for organization
Creating a table requires defining a name and schema that describes its structure:
Copy
Ask AI
import pixeltable as pxt# Create a directory to organize tablespxt.create_dir('example')# Create a table with a defined schemafilms = pxt.create_table('example.films', { 'title': pxt.String, 'year': pxt.Int, 'revenue': pxt.Float})
Primary keys uniquely identify rows in a table and enable efficient lookups and updates. You can designate one or more columns as primary keys when creating a table:
Pixeltable allows you to explicitly cast column values to ensure they conform to the expected type. This is particularly useful when working with computed columns or transforming data from external sources.
Copy
Ask AI
# Cast columns to different typestable.update({ 'int_score': table.score.astype(pxt.Int), # Cast float to integer 'string_count': table.count.astype(pxt.String), # Cast integer to string})# Using casting in computed columnsfilms.add_computed_column( budget_category=films.budget.astype(pxt.String) + ' million')# Casting in expressionsfilms.where(films.revenue.astype(pxt.Int) > 100).collect()
Column casting helps maintain data consistency and prevents type errors when processing your data.
# Basic row countfilms.count() # Returns total number of rows# Basic filteringfilms.where(films.budget >= 200.0).collect()# Select specific columnsfilms.select(films.title, films.year).collect()# Limit resultsfilms.limit(5).collect() # First 5 rows (no specific order)films.head(5) # First 5 rows by insertion orderfilms.tail(5) # Last 5 rows by insertion order# Order resultsfilms.order_by(films.budget, asc=False).limit(5).collect()
Create tables or insert data directly from external sources:
Copy
Ask AI
import pixeltable as pxtimport pandas as pd# Create a table from a CSV filetable = pxt.create_table('world_population', source='https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/world-population-data.csv')# Create a table from a pandas DataFramedf = pd.DataFrame({ 'cca3': ['FRA', 'DEU', 'ITA'], 'country': ['France', 'Germany', 'Italy'], 'continent': ['Europe', 'Europe', 'Europe'], 'pop_2023': [68_000_000, 83_000_000, 59_000_000]})table = pxt.create_table('europe_population', source=df)# Insert data from a pandas DataFrame into an existing tablenew_df = pd.DataFrame({ 'cca3': ['ESP', 'GBR', 'POL'], 'country': ['Spain', 'United Kingdom', 'Poland'], 'continent': ['Europe', 'Europe', 'Europe'], 'pop_2023': [47_000_000, 67_000_000, 38_000_000]})table.insert(new_df)
Pixeltable supports importing from various data sources:
CSV files (.csv)
Excel files (.xls, .xlsx)
Parquet files (.parquet, .pq, .parq)
JSON files (.json)
Pandas DataFrames
Pixeltable Queries
Hugging Face datasets
Update Operations
Modify existing data:
Copy
Ask AI
# Update all rowsfilms.update({ 'budget': films.budget * 1.1 # Increase all budgets by 10%})# Conditional updatesfilms.where( films.year < 2000).update({ 'plot': films.plot + ' (Classic Film)'})# Batch updates for multiple rowsupdates = [ {'id': 1, 'budget': 175.0}, {'id': 2, 'budget': 185.0}]films.batch_update(updates)
Delete Operations
Remove data with conditions:
Copy
Ask AI
# Delete specific rowsfilms.where( films.year < 1995).delete()# Delete with complex conditionsfilms.where( (films.budget < 100.0) & (films.year < 2000)).delete()# WARNING: Delete all rows (use with caution!)# films.delete() # Without where clause deletes all rows
Column Operations
Manage table structure:
Copy
Ask AI
# Add new columnfilms.add_column(rating=pxt.String)# Drop columnfilms.drop_column('rating')# View schemafilms.describe()
Versioning
Manage table versions:
Copy
Ask AI
# Revert the last operationfilms.revert() # Cannot be undone!# Revert multiple times to go back furtherfilms.revert()films.revert() # Goes back two operations
Export Operations
Extract data for analysis:
Copy
Ask AI
# Get results as Python objectsresult = films.limit(5).collect()first_row = result[0] # Get first row as dicttimestamps = result['timestamp'] # Get list of values for one column# Convert to Pandasdf = resultdf['revenue'].describe() # Get statistics for revenue column
Join Tables
Combine data from multiple tables using different join types.
Returns all records from the left table and matching records from the right table.
Copy
Ask AI
left_join_result = customers.join( orders, on=customers.customer_id == orders.customer_id, how='left').select( customers.name, orders.amount)left_df = left_join_result.collect()print(left_df)# Output will show all customers (1-5), with null for amount where no order exists
Returns all records from both tables, with nulls where there’s no match.
Copy
Ask AI
full_join_result = customers.join( orders, on=customers.customer_id == orders.customer_id, how='full_outer').select( customers.name, orders.amount)full_df = full_join_result.collect()print(full_df)# Output will show all customers and all orders, with nulls where no match exists
When inserting data or evaluating computed columns, errors may occur (e.g., invalid media files, API failures, type mismatches). Pixeltable provides the on_error parameter to control how these errors are handled:
Copy
Ask AI
# Abort on any error (default behavior)t.insert([...], on_error='abort')# Continue processing and log errorst.insert([...], on_error='ignore')
When using on_error='ignore', cells with errors will have None values, but you can access error details through special column properties:
Copy
Ask AI
# Create a table with a computed column that might failt = pxt.create_table('app.docs', {'url': pxt.String})t.add_computed_column(content=some_function_that_might_fail(t.url))# Insert data, ignoring errorst.insert([{'url': 'https://example.com/valid'}, {'url': 'invalid-url'}], on_error='ignore')# Query error information for failed rowserrors = t.where(t.content == None).select( t.url, t.content.errortype, # Exception class name t.content.errormsg # Error message).collect()print(errors)# Shows which URLs failed and why
Use on_error='ignore' in production pipelines to ensure partial failures don’t halt your entire workflow. You can then query and retry failed rows later.
Move tables between directories or rename them using pxt.move():
Copy
Ask AI
# Move a table to a different directorypxt.move('old_dir.my_table', 'new_dir.my_table')# Rename a table (same directory)pxt.move('my_dir.old_name', 'my_dir.new_name')# Handle existing paths gracefullypxt.move('source.table', 'dest.table', if_exists='ignore')
List Contents
Browse your Pixeltable namespace with pxt.ls():
Copy
Ask AI
# List contents of root directorypxt.ls()# Name Kind Version Base# my_dir dir# my_table table 5# List contents of a subdirectorypxt.ls('my_dir')# Programmatic access to directory contentscontents = pxt.get_dir_contents('my_dir')print(contents['tables']) # List of table pathsprint(contents['dirs']) # List of subdirectory paths
Data Export
Export data to various formats for external use:
Copy
Ask AI
import pixeltable as pxtfrom pixeltable import iot = pxt.get_table('my_app.data')# Export to Parquet fileio.export_parquet(t, 'output.parquet')# Export a query result to Parquetquery = t.where(t.score > 0.9).select(t.id, t.embedding)io.export_parquet(query, 'high_scores.parquet')# Export to LanceDB for vector searchio.export_lancedb(t, 'lance_db_path', 'table_name')# Convert to Pandas DataFramedf = t.collect()df.to_csv('output.csv')# Export images as FiftyOne datasetio.export_images_as_fo_dataset(t, 'fiftyone_dataset')
# table.py - Run once to set uppxt.create_table(..., if_exists="ignore")# app.py - Production codetable = pxt.get_table("myapp.mytable")if table is None: raise RuntimeError("Run table.py first!")