This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
In Pixeltable, all media data (videos, images, audio) resides in
external files, and Pixeltable stores references to those. The files can
be local or remote (e.g., in S3). For the latter, Pixeltable
automatically caches the files locally on access.
When interacting with media data via Pixeltable, either through queries
or UDFs, the user sees the following Python types:
ImageType: PIL.Image.Image
VideoType: string (local path)
AudioType: string (local path)
Let’s create a table and load some data to see what that looks like:
%pip install -qU pixeltable boto3
import tempfile
import random
import shutil
import pixeltable as pxt
# First drop the `external_data` directory if it exists, to ensure
# a clean environment for the demo
pxt.drop_dir('external_data', force=True)
pxt.create_dir('external_data')
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory `external_data`.
<pixeltable.catalog.dir.Dir at 0x176646bb0>
v = pxt.create_table('external_data.videos', {'video': pxt.Video})
prefix = 's3://multimedia-commons/'
paths = [
'data/videos/mp4/ffe/ffb/ffeffbef41bbc269810b2a1a888de.mp4',
'data/videos/mp4/ffe/feb/ffefebb41485539f964760e6115fbc44.mp4',
'data/videos/mp4/ffe/f73/ffef7384d698b5f70d411c696247169.mp4'
]
v.insert({'video': prefix + p} for p in paths)
Created table `videos`.
Computing cells: 0%| | 0/6 [00:00<?, ? cells/s]
Inserting rows into `videos`: 3 rows [00:00, 1004.62 rows/s]
Computing cells: 100%|████████████████████████████████████████████| 6/6 [00:00<00:00, 79.14 cells/s]
Inserted 3 rows with 0 errors.
UpdateStatus(num_rows=3, num_computed_values=6, num_excs=0, updated_cols=[], cols_with_excs=[])
UpdateStatus(num_rows=3, num_computed_values=0, num_excs=0, updated_cols=[], cols_with_excs=[])
We just inserted 3 rows with video files residing in S3. When we now
query these, we are presented with their locally cached counterparts.
(Note: we don’t simply display the output of collect() here, because
that is formatted as an HTML table with a media player and so would
obscure the file path.)
rows = list(v.select(v.video).collect())
rows[0]
{‘video’: ‘/Users/asiegel/.pixeltable/file_cache/682f022a704d4459adb2f29f7fe9577c_0_1fcfcb221263cff76a2853250fbbb2e90375dd495454c0007bc6ff4430c9a4a7.mp4’}
Let’s make a local copy of the first file and insert that separately.
First, the copy:
local_path = tempfile.mktemp(suffix='.mp4')
shutil.copyfile(rows[0]['video'], local_path)
local_path
‘/var/folders/hb/qd0dztsj43j_mdb6hbl1gzyc0000gn/T/tmp1jo4a7ca.mp4’
Now the insert:
v.insert([{'video': local_path}])
Computing cells: 0%| | 0/2 [00:00<?, ? cells/s]
Inserting rows into `videos`: 1 rows [00:00, 725.78 rows/s]
Computing cells: 100%|████████████████████████████████████████████| 2/2 [00:00<00:00, 53.23 cells/s]
Inserted 1 row with 0 errors.
UpdateStatus(num_rows=1, num_computed_values=2, num_excs=0, updated_cols=[], cols_with_excs=[])
When we query this again, we see that local paths are preserved:
rows = list(v.select(v.video).collect())
rows
[{‘video’: ‘/Users/asiegel/.pixeltable/file_cache/682f022a704d4459adb2f29f7fe9577c_0_1fcfcb221263cff76a2853250fbbb2e90375dd495454c0007bc6ff4430c9a4a7.mp4’},
{‘video’: ‘/Users/asiegel/.pixeltable/file_cache/682f022a704d4459adb2f29f7fe9577c_0_fc11428b32768ae782193a57ebcbad706f45bbd9fa13354471e0bcd798fee3ea.mp4’},
{‘video’: ‘/Users/asiegel/.pixeltable/file_cache/682f022a704d4459adb2f29f7fe9577c_0_b9fb0d9411bc9cd183a36866911baa7a8834f22f665bce47608566b38485c16a.mp4’},
{‘video’: ‘/var/folders/hb/qd0dztsj43j_mdb6hbl1gzyc0000gn/T/tmp1jo4a7ca.mp4’}]
UDFs also see local paths:
@pxt.udf
def f(v: pxt.Video) -> int:
print(f'{type(v)}: {v}')
return 1
v.select(f(v.video)).show()
<class ‘str’>: /Users/asiegel/.pixeltable/file_cache/682f022a704d4459adb2f29f7fe9577c_0_1fcfcb221263cff76a2853250fbbb2e90375dd495454c0007bc6ff4430c9a4a7.mp4
<class ‘str’>: /Users/asiegel/.pixeltable/file_cache/682f022a704d4459adb2f29f7fe9577c_0_fc11428b32768ae782193a57ebcbad706f45bbd9fa13354471e0bcd798fee3ea.mp4
<class ‘str’>: /Users/asiegel/.pixeltable/file_cache/682f022a704d4459adb2f29f7fe9577c_0_b9fb0d9411bc9cd183a36866911baa7a8834f22f665bce47608566b38485c16a.mp4
<class ‘str’>: /var/folders/hb/qd0dztsj43j_mdb6hbl1gzyc0000gn/T/tmp1jo4a7ca.mp4
Dealing with errors
When interacting with media data in Pixeltable, the user can assume that
the underlying files exist, are local and are valid for their respective
data type. In other words, the user doesn’t need to consider error
conditions.
To that end, Pixeltable validates media data on ingest. The default
behavior is to reject invalid media files:
v.insert([{'video': prefix + 'bad_path.mp4'}])
Computing cells: 0%| | 0/2 [00:01<?, ? cells/s]
Error: Failed to download s3://multimedia-commons/bad_path.mp4: An error occurred (404) when calling the HeadObject operation: Not Found
[0;31m---------------------------------------------------------------------------[0m
[0;31mError[0m Traceback (most recent call last)
Cell [0;32mIn[9], line 1[0m
[0;32m----> 1[0m [43mv[49m[38;5;241;43m.[39;49m[43minsert[49m[43m([49m[43mvideo[49m[38;5;241;43m=[39;49m[43mprefix[49m[43m [49m[38;5;241;43m+[39;49m[43m [49m[38;5;124;43m’[39;49m[38;5;124;43mbad_path.mp4[39;49m[38;5;124;43m’[39;49m[43m)[49mFile [0;32m~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/insertable_table.py:125[0m, in [0;36mInsertableTable.insert[0;34m(self, rows, print_stats, on_error, **kwargs)[0m
[1;32m 123[0m [38;5;28;01mraise[39;00m excs[38;5;241m.[39mError([38;5;124m’[39m[38;5;124mrows must be a list of dictionaries[39m[38;5;124m’[39m)
[1;32m 124[0m [38;5;28mself[39m[38;5;241m.[39m_validate_input_rows(rows)
[0;32m—> 125[0m status [38;5;241m=[39m [38;5;28;43mself[39;49m[38;5;241;43m.[39;49m[43m_tbl_version[49m[38;5;241;43m.[39;49m[43minsert[49m[43m([49m[43mrows[49m[43m,[49m[43m [49m[38;5;28;43;01mNone[39;49;00m[43m,[49m[43m [49m[43mprint_stats[49m[38;5;241;43m=[39;49m[43mprint_stats[49m[43m,[49m[43m [49m[43mfail_on_exception[49m[38;5;241;43m=[39;49m[43mfail_on_exception[49m[43m)[49m
[1;32m 127[0m [38;5;28;01mif[39;00m status[38;5;241m.[39mnum_excs [38;5;241m==[39m [38;5;241m0[39m:
[1;32m 128[0m cols_with_excs_str [38;5;241m=[39m [38;5;124m’[39m[38;5;124m’[39mFile [0;32m~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/table_version.py:723[0m, in [0;36mTableVersion.insert[0;34m(self, rows, df, conn, print_stats, fail_on_exception)[0m
[1;32m 721[0m [38;5;28;01mif[39;00m conn [38;5;129;01mis[39;00m [38;5;28;01mNone[39;00m:
[1;32m 722[0m [38;5;28;01mwith[39;00m Env[38;5;241m.[39mget()[38;5;241m.[39mengine[38;5;241m.[39mbegin() [38;5;28;01mas[39;00m conn:
[0;32m—> 723[0m [38;5;28;01mreturn[39;00m [38;5;28;43mself[39;49m[38;5;241;43m.[39;49m[43m_insert[49m[43m([49m
[1;32m 724[0m [43m [49m[43mplan[49m[43m,[49m[43m [49m[43mconn[49m[43m,[49m[43m [49m[43mtime[49m[38;5;241;43m.[39;49m[43mtime[49m[43m([49m[43m)[49m[43m,[49m[43m [49m[43mprint_stats[49m[38;5;241;43m=[39;49m[43mprint_stats[49m[43m,[49m[43m [49m[43mrowids[49m[38;5;241;43m=[39;49m[43mrowids[49m[43m([49m[43m)[49m[43m,[49m[43m [49m[43mabort_on_exc[49m[38;5;241;43m=[39;49m[43mfail_on_exception[49m[43m)[49m
[1;32m 725[0m [38;5;28;01melse[39;00m:
[1;32m 726[0m [38;5;28;01mreturn[39;00m [38;5;28mself[39m[38;5;241m.[39m_insert(
[1;32m 727[0m plan, conn, time[38;5;241m.[39mtime(), print_stats[38;5;241m=[39mprint_stats, rowids[38;5;241m=[39mrowids(), abort_on_exc[38;5;241m=[39mfail_on_exception)File [0;32m~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/table_version.py:737[0m, in [0;36mTableVersion._insert[0;34m(self, exec_plan, conn, timestamp, rowids, print_stats, abort_on_exc)[0m
[1;32m 735[0m [38;5;28mself[39m[38;5;241m.[39mversion [38;5;241m+[39m[38;5;241m=[39m [38;5;241m1[39m
[1;32m 736[0m result [38;5;241m=[39m UpdateStatus()
[0;32m—> 737[0m num_rows, num_excs, cols_with_excs [38;5;241m=[39m [38;5;28;43mself[39;49m[38;5;241;43m.[39;49m[43mstore_tbl[49m[38;5;241;43m.[39;49m[43minsert_rows[49m[43m([49m
[1;32m 738[0m [43m [49m[43mexec_plan[49m[43m,[49m[43m [49m[43mconn[49m[43m,[49m[43m [49m[43mv_min[49m[38;5;241;43m=[39;49m[38;5;28;43mself[39;49m[38;5;241;43m.[39;49m[43mversion[49m[43m,[49m[43m [49m[43mrowids[49m[38;5;241;43m=[39;49m[43mrowids[49m[43m,[49m[43m [49m[43mabort_on_exc[49m[38;5;241;43m=[39;49m[43mabort_on_exc[49m[43m)[49m
[1;32m 739[0m result[38;5;241m.[39mnum_rows [38;5;241m=[39m num_rows
[1;32m 740[0m result[38;5;241m.[39mnum_excs [38;5;241m=[39m num_excsFile [0;32m~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/store.py:323[0m, in [0;36mStoreBase.insert_rows[0;34m(self, exec_plan, conn, v_min, show_progress, rowids, abort_on_exc)[0m
[1;32m 321[0m [38;5;28;01mtry[39;00m:
[1;32m 322[0m exec_plan[38;5;241m.[39mopen()
[0;32m—> 323[0m [38;5;28;01mfor[39;00m row_batch [38;5;129;01min[39;00m exec_plan:
[1;32m 324[0m num_rows [38;5;241m+[39m[38;5;241m=[39m [38;5;28mlen[39m(row_batch)
[1;32m 325[0m [38;5;28;01mfor[39;00m batch_start_idx [38;5;129;01min[39;00m [38;5;28mrange[39m([38;5;241m0[39m, [38;5;28mlen[39m(row_batch), [38;5;28mself[39m[38;5;241m.[39m__INSERT_BATCH_SIZE):
[1;32m 326[0m [38;5;66;03m# compute batch of rows and convert them into table rows[39;00mFile [0;32m~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/exec/expr_eval_node.py:45[0m, in [0;36mExprEvalNode.__next__[0;34m(self)[0m
[1;32m 44[0m [38;5;28;01mdef[39;00m [38;5;21m__next__[39m([38;5;28mself[39m) [38;5;241m-[39m[38;5;241m>[39m DataRowBatch:
[0;32m---> 45[0m input_batch [38;5;241m=[39m [38;5;28;43mnext[39;49m[43m([49m[38;5;28;43mself[39;49m[38;5;241;43m.[39;49m[43minput[49m[43m)[49m
[1;32m 46[0m [38;5;66;03m# compute target exprs[39;00m
[1;32m 47[0m [38;5;28;01mfor[39;00m cohort [38;5;129;01min[39;00m [38;5;28mself[39m[38;5;241m.[39mcohorts:File [0;32m~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/exec/cache_prefetch_node.py:71[0m, in [0;36mCachePrefetchNode.__next__[0;34m(self)[0m
[1;32m 68[0m futures[executor[38;5;241m.[39msubmit([38;5;28mself[39m[38;5;241m.[39m_fetch_url, row, info[38;5;241m.[39mslot_idx)] [38;5;241m=[39m (row, info)
[1;32m 69[0m [38;5;28;01mfor[39;00m future [38;5;129;01min[39;00m concurrent[38;5;241m.[39mfutures[38;5;241m.[39mas_completed(futures):
[1;32m 70[0m [38;5;66;03m# TODO: does this need to deal with recoverable errors (such as retry after throttling)?[39;00m
[0;32m---> 71[0m tmp_path [38;5;241m=[39m [43mfuture[49m[38;5;241;43m.[39;49m[43mresult[49m[43m([49m[43m)[49m
[1;32m 72[0m [38;5;28;01mif[39;00m tmp_path [38;5;129;01mis[39;00m [38;5;28;01mNone[39;00m:
[1;32m 73[0m [38;5;28;01mcontinue[39;00mFile [0;32m/opt/miniconda3/envs/pxt/lib/python3.9/concurrent/futures/_base.py:439[0m, in [0;36mFuture.result[0;34m(self, timeout)[0m
[1;32m 437[0m [38;5;28;01mraise[39;00m CancelledError()
[1;32m 438[0m [38;5;28;01melif[39;00m [38;5;28mself[39m[38;5;241m.[39m_state [38;5;241m==[39m FINISHED:
[0;32m—> 439[0m [38;5;28;01mreturn[39;00m [38;5;28;43mself[39;49m[38;5;241;43m.[39;49m[43m__get_result[49m[43m([49m[43m)[49m
[1;32m 441[0m [38;5;28mself[39m[38;5;241m.[39m_condition[38;5;241m.[39mwait(timeout)
[1;32m 443[0m [38;5;28;01mif[39;00m [38;5;28mself[39m[38;5;241m.[39m_state [38;5;129;01min[39;00m [CANCELLED, CANCELLED_AND_NOTIFIED]:File [0;32m/opt/miniconda3/envs/pxt/lib/python3.9/concurrent/futures/_base.py:391[0m, in [0;36mFuture.__get_result[0;34m(self)[0m
[1;32m 389[0m [38;5;28;01mif[39;00m [38;5;28mself[39m[38;5;241m.[39m_exception:
[1;32m 390[0m [38;5;28;01mtry[39;00m:
[0;32m—> 391[0m [38;5;28;01mraise[39;00m [38;5;28mself[39m[38;5;241m.[39m_exception
[1;32m 392[0m [38;5;28;01mfinally[39;00m:
[1;32m 393[0m [38;5;66;03m# Break a reference cycle with the exception in self._exception[39;00m
[1;32m 394[0m [38;5;28mself[39m [38;5;241m=[39m [38;5;28;01mNone[39;00mFile [0;32m/opt/miniconda3/envs/pxt/lib/python3.9/concurrent/futures/thread.py:58[0m, in [0;36m_WorkItem.run[0;34m(self)[0m
[1;32m 55[0m [38;5;28;01mreturn[39;00m
[1;32m 57[0m [38;5;28;01mtry[39;00m:
[0;32m---> 58[0m result [38;5;241m=[39m [38;5;28;43mself[39;49m[38;5;241;43m.[39;49m[43mfn[49m[43m([49m[38;5;241;43m[39;49m[38;5;28;43mself[39;49m[38;5;241;43m.[39;49m[43margs[49m[43m,[49m[43m [49m[38;5;241;43m[39;49m[38;5;241;43m*[39;49m[38;5;28;43mself[39;49m[38;5;241;43m.[39;49m[43mkwargs[49m[43m)[49m
[1;32m 59[0m [38;5;28;01mexcept[39;00m [38;5;167;01mBaseException[39;00m [38;5;28;01mas[39;00m exc:
[1;32m 60[0m [38;5;28mself[39m[38;5;241m.[39mfuture[38;5;241m.[39mset_exception(exc)File [0;32m~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/exec/cache_prefetch_node.py:115[0m, in [0;36mCachePrefetchNode._fetch_url[0;34m(self, row, slot_idx)[0m
[1;32m 113[0m [38;5;28mself[39m[38;5;241m.[39mrow_builder[38;5;241m.[39mset_exc(row, slot_idx, exc)
[1;32m 114[0m [38;5;28;01mif[39;00m [38;5;129;01mnot[39;00m [38;5;28mself[39m[38;5;241m.[39mctx[38;5;241m.[39mignore_errors:
[0;32m—> 115[0m [38;5;28;01mraise[39;00m exc [38;5;28;01mfrom[39;00m [38;5;28;01mNone[39;00m [38;5;66;03m# suppress original exception[39;00m
[1;32m 116[0m [38;5;28;01mreturn[39;00m [38;5;28;01mNone[39;00m[0;31mError[0m: Failed to download s3://multimedia-commons/bad_path.mp4: An error occurred (404) when calling the HeadObject operation: Not Found
The same happens for corrupted files:
# create invalid .mp4
with tempfile.NamedTemporaryFile(mode='wb', suffix='.mp4', delete=False) as temp_file:
temp_file.write(random.randbytes(1024))
corrupted_path = temp_file.name
v.insert([{'video': corrupted_path}])
Computing cells: 100%|██████████████████████████████████████████| 2/2 [00:00<00:00, 1084.64 cells/s]
Error: Not a valid video: /var/folders/hb/qd0dztsj43j_mdb6hbl1gzyc0000gn/T/tmp3djgfyjp.mp4
[0;31m---------------------------------------------------------------------------[0m
[0;31mError[0m Traceback (most recent call last)
Cell [0;32mIn[10], line 6[0m
[1;32m 3[0m temp_file[38;5;241m.[39mwrite(random[38;5;241m.[39mrandbytes([38;5;241m1024[39m))
[1;32m 4[0m corrupted_path [38;5;241m=[39m temp_file[38;5;241m.[39mname
[0;32m----> 6[0m [43mv[49m[38;5;241;43m.[39;49m[43minsert[49m[43m([49m[43mvideo[49m[38;5;241;43m=[39;49m[43mcorrupted_path[49m[43m)[49mFile [0;32m~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/insertable_table.py:125[0m, in [0;36mInsertableTable.insert[0;34m(self, rows, print_stats, on_error, **kwargs)[0m
[1;32m 123[0m [38;5;28;01mraise[39;00m excs[38;5;241m.[39mError([38;5;124m’[39m[38;5;124mrows must be a list of dictionaries[39m[38;5;124m’[39m)
[1;32m 124[0m [38;5;28mself[39m[38;5;241m.[39m_validate_input_rows(rows)
[0;32m—> 125[0m status [38;5;241m=[39m [38;5;28;43mself[39;49m[38;5;241;43m.[39;49m[43m_tbl_version[49m[38;5;241;43m.[39;49m[43minsert[49m[43m([49m[43mrows[49m[43m,[49m[43m [49m[38;5;28;43;01mNone[39;49;00m[43m,[49m[43m [49m[43mprint_stats[49m[38;5;241;43m=[39;49m[43mprint_stats[49m[43m,[49m[43m [49m[43mfail_on_exception[49m[38;5;241;43m=[39;49m[43mfail_on_exception[49m[43m)[49m
[1;32m 127[0m [38;5;28;01mif[39;00m status[38;5;241m.[39mnum_excs [38;5;241m==[39m [38;5;241m0[39m:
[1;32m 128[0m cols_with_excs_str [38;5;241m=[39m [38;5;124m’[39m[38;5;124m’[39mFile [0;32m~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/table_version.py:723[0m, in [0;36mTableVersion.insert[0;34m(self, rows, df, conn, print_stats, fail_on_exception)[0m
[1;32m 721[0m [38;5;28;01mif[39;00m conn [38;5;129;01mis[39;00m [38;5;28;01mNone[39;00m:
[1;32m 722[0m [38;5;28;01mwith[39;00m Env[38;5;241m.[39mget()[38;5;241m.[39mengine[38;5;241m.[39mbegin() [38;5;28;01mas[39;00m conn:
[0;32m—> 723[0m [38;5;28;01mreturn[39;00m [38;5;28;43mself[39;49m[38;5;241;43m.[39;49m[43m_insert[49m[43m([49m
[1;32m 724[0m [43m [49m[43mplan[49m[43m,[49m[43m [49m[43mconn[49m[43m,[49m[43m [49m[43mtime[49m[38;5;241;43m.[39;49m[43mtime[49m[43m([49m[43m)[49m[43m,[49m[43m [49m[43mprint_stats[49m[38;5;241;43m=[39;49m[43mprint_stats[49m[43m,[49m[43m [49m[43mrowids[49m[38;5;241;43m=[39;49m[43mrowids[49m[43m([49m[43m)[49m[43m,[49m[43m [49m[43mabort_on_exc[49m[38;5;241;43m=[39;49m[43mfail_on_exception[49m[43m)[49m
[1;32m 725[0m [38;5;28;01melse[39;00m:
[1;32m 726[0m [38;5;28;01mreturn[39;00m [38;5;28mself[39m[38;5;241m.[39m_insert(
[1;32m 727[0m plan, conn, time[38;5;241m.[39mtime(), print_stats[38;5;241m=[39mprint_stats, rowids[38;5;241m=[39mrowids(), abort_on_exc[38;5;241m=[39mfail_on_exception)File [0;32m~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/table_version.py:737[0m, in [0;36mTableVersion._insert[0;34m(self, exec_plan, conn, timestamp, rowids, print_stats, abort_on_exc)[0m
[1;32m 735[0m [38;5;28mself[39m[38;5;241m.[39mversion [38;5;241m+[39m[38;5;241m=[39m [38;5;241m1[39m
[1;32m 736[0m result [38;5;241m=[39m UpdateStatus()
[0;32m—> 737[0m num_rows, num_excs, cols_with_excs [38;5;241m=[39m [38;5;28;43mself[39;49m[38;5;241;43m.[39;49m[43mstore_tbl[49m[38;5;241;43m.[39;49m[43minsert_rows[49m[43m([49m
[1;32m 738[0m [43m [49m[43mexec_plan[49m[43m,[49m[43m [49m[43mconn[49m[43m,[49m[43m [49m[43mv_min[49m[38;5;241;43m=[39;49m[38;5;28;43mself[39;49m[38;5;241;43m.[39;49m[43mversion[49m[43m,[49m[43m [49m[43mrowids[49m[38;5;241;43m=[39;49m[43mrowids[49m[43m,[49m[43m [49m[43mabort_on_exc[49m[38;5;241;43m=[39;49m[43mabort_on_exc[49m[43m)[49m
[1;32m 739[0m result[38;5;241m.[39mnum_rows [38;5;241m=[39m num_rows
[1;32m 740[0m result[38;5;241m.[39mnum_excs [38;5;241m=[39m num_excsFile [0;32m~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/store.py:334[0m, in [0;36mStoreBase.insert_rows[0;34m(self, exec_plan, conn, v_min, show_progress, rowids, abort_on_exc)[0m
[1;32m 332[0m [38;5;28;01mif[39;00m abort_on_exc [38;5;129;01mand[39;00m row[38;5;241m.[39mhas_exc():
[1;32m 333[0m exc [38;5;241m=[39m row[38;5;241m.[39mget_first_exc()
[0;32m—> 334[0m [38;5;28;01mraise[39;00m exc
[1;32m 336[0m rowid [38;5;241m=[39m ([38;5;28mnext[39m(rowids),) [38;5;28;01mif[39;00m rowids [38;5;129;01mis[39;00m [38;5;129;01mnot[39;00m [38;5;28;01mNone[39;00m [38;5;28;01melse[39;00m row[38;5;241m.[39mpk[:[38;5;241m-[39m[38;5;241m1[39m]
[1;32m 337[0m pk [38;5;241m=[39m rowid [38;5;241m+[39m (v_min,)File [0;32m~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/exprs/column_ref.py:159[0m, in [0;36mColumnRef.eval[0;34m(self, data_row, row_builder)[0m
[1;32m 156[0m [38;5;28;01mreturn[39;00m
[1;32m 158[0m [38;5;28;01mtry[39;00m:
[0;32m—> 159[0m [38;5;28;43mself[39;49m[38;5;241;43m.[39;49m[43mcol[49m[38;5;241;43m.[39;49m[43mcol_type[49m[38;5;241;43m.[39;49m[43mvalidate_media[49m[43m([49m[43mdata_row[49m[38;5;241;43m.[39;49m[43mfile_paths[49m[43m[[49m[43munvalidated_slot_idx[49m[43m][49m[43m)[49m
[1;32m 160[0m [38;5;66;03m# access the value only after successful validation[39;00m
[1;32m 161[0m val [38;5;241m=[39m data_row[unvalidated_slot_idx]File [0;32m~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/type_system.py:906[0m, in [0;36mVideoType.validate_media[0;34m(self, val)[0m
[1;32m 904[0m [38;5;28;01mraise[39;00m excs[38;5;241m.[39mError([38;5;124mf[39m[38;5;124m’[39m[38;5;124mNot a valid video: [39m[38;5;132;01m{[39;00mval[38;5;132;01m}[39;00m[38;5;124m’[39m)
[1;32m 905[0m [38;5;28;01mexcept[39;00m av[38;5;241m.[39mAVError:
[0;32m—> 906[0m [38;5;28;01mraise[39;00m excs[38;5;241m.[39mError([38;5;124mf[39m[38;5;124m’[39m[38;5;124mNot a valid video: [39m[38;5;132;01m{[39;00mval[38;5;132;01m}[39;00m[38;5;124m’[39m) [38;5;28;01mfrom[39;00m [38;5;28;01mNone[39;00m[0;31mError[0m: Not a valid video: /var/folders/hb/qd0dztsj43j_mdb6hbl1gzyc0000gn/T/tmp3djgfyjp.mp4
Alternatively, Pixeltable can also be instructed to record error
conditions and proceed with the ingest, via the on_error flag
(default: 'abort'):
v.insert([{'video': prefix + 'bad_path.mp4'}, {'video': corrupted_path}], on_error='ignore')
Computing cells: 100%|████████████████████████████████████████████| 4/4 [00:00<00:00, 20.98 cells/s]
Inserting rows into `videos`: 2 rows [00:00, 671.63 rows/s]
Computing cells: 100%|████████████████████████████████████████████| 4/4 [00:00<00:00, 20.13 cells/s]
Inserted 2 rows with 4 errors across 2 columns (videos.video, videos.None).
UpdateStatus(num_rows=2, num_computed_values=4, num_excs=4, updated_cols=[], cols_with_excs=[‘videos.video’, ‘videos.None’])
Every media column has properties errortype and errormsg (both
containing string data) that indicate whether the column value is
valid. Invalid values show up as None and have non-null
errortype/errormsg:
v.select(v.video == None, v.video.errortype, v.video.errormsg).collect()
Errors can now be inspected (and corrected) after the ingest:
v.where(v.video.errortype != None).select(v.video.errormsg).collect()
Accessing the original file paths
In some cases, it will be necessary to access file paths (not, say, the
PIL.Image.Image), and Pixeltable provides the column properties
fileurl and localpath for that purpose:
v.select(v.video.fileurl, v.video.localpath).collect()
Note that for local media files, the fileurl property still returns a
parsable URL.