gedidb.GEDIDatabase#
- class gedidb.GEDIDatabase(config: Dict[str, Any], credentials: dict | None = None)[source]#
Manages creation and operation of global TileDB arrays for GEDI data storage.
Performance design decisions#
Hilbert pre-sort is opt-in via config (‘hilbert_presort’: true). It improves compression and read locality, but costs an argsort + fancy-index copy per granule. Disable when write throughput matters more than read performance.
TileDBFilterPolicy defaults to fast-write mode (‘use_filters’: false). In fast mode only Zstd(1) is applied — no ByteShuffle or BitWidthReduction pre-processors. Set ‘use_filters: true’ in config to enable the full compression pipeline (ByteShuffle+Zstd for floats, BitWidthReduction+Zstd for narrow ints). DoubleDelta is kept on time/timestamp in both modes.
dtype coercion in _extract_variable_data is skipped when the source Series already matches the target dtype — avoids a full array copy for every attribute on every granule.
Spatial domain bounds and array domain metadata are cached on __init__ so write_granule / spatial_chunking never re-open the array just for config lookups.
allows_duplicates=True preserves all valid GEDU shots, including co-located shots within the same UTC day. The old drop_duplicates() silently discarded valid data.
write_batch() amortises the TileDB open/close cost across many granules. Prefer it over calling write_granule() in a loop for large ingestion jobs.
mark_granule_as_processed() now has retry logic (absent in old version).
timestamp_ns is stored as true int64 nanoseconds. The old version divided by 1000 (yielding microseconds), which broke nanosecond-precision deduplication.
Methods
__init__(config[, credentials])Initialise GEDIDatabase.
check_granules_status(granule_ids[, full_only])Check processed status for a list of granule IDs in a single metadata read.
consolidate_fragments([consolidation_type, ...])Consolidate fragments, metadata, and commit logs.
mark_granule_as_processed(granule_key)Mark a granule as processed in TileDB metadata (with retry).
mark_granules_as_processed_batch(granule_keys)Mark multiple granules with the given status in a single TileDB open/close.
spatial_chunking(dataset[, ...])Yield ((lat_min, lat_max, lon_min, lon_max), view) pairs.
write_granule(granule_data)Write the parsed GEDI granule data to the global TileDB arrays, filtering out shots that are outside the spatial domain.