TileDB Global Database for GEDI Data#

Important

If you use the database for your publications, please acknowledge that the dataset has been processed using gediDB:

Besnard, S., Dombrowski, F., & Holcomb, A. gediDB [Computer software]. https://doi.org/10.5281/zenodo.13885229.

Overview#

The publicly available TileDB global database, managed by the Global Land Monitoring group at GFZ-Potsdam, stores all processed GEDI version 2 data with a robust and scalable architecture. All granules for the products L2A, L2B, L4A, and L4C have been ingested into the database. The data is stored in a Ceph object storage managed by the GFZ data center, with a current size of approximately 20TB. It enables efficient spatial, temporal, and attribute-based queries. This page provides an overview of the database setup, configuration, and access methods using the gediDB package.

Ceph Object Storage Configuration#

The TileDB global database utilizes a Ceph object storage backend to efficiently manage and distribute GEDI data. Below are the key characteristics of the Ceph bucket:

  • Bucket Name: dog.gedidb.gedi-l2-l4-v002

  • Access Endpoint: https://s3.gfz-potsdam.de

  • Region: eu-central-1

  • Total Storage Used: ~20TB

  • Access Control: Public

  • Query Support: Optimized for spatial and temporal queries

For users accessing the database programmatically, interactions with the Ceph bucket are abstracted by the gediDB package, which retrieves data seamlessly from TileDB. Advanced users with direct access to the Ceph storage layer may utilize S3-compatible tools (such as aws s3api or rclone) to interact with the data.

TileDB Database Configuration#

The database configuration defines key parameters for data storage, tiling, and query efficiency. A critical aspect of the database is the application of spatial consolidation, where fragments belonging to the same 10x10-degree spatial windows are consolidated together. This strategy significantly enhances query performance by reducing the number of fragments accessed during spatial queries.

Below is the structure of the configuration file used to build the TileDB database:

# database parameters
tiledb:
  storage_type: 's3'
  s3_bucket: "dog.gedidb.gedi-l2-l4-v002"
  url: "https://s3.gfz-potsdam.de"
  overwrite: false
  temporal_tiling: "weekly"
  chunk_size: 10
  time_range:
    start_time: "2018-01"
    end_time: "2030-12-31"
  spatial_range:
    lat_min: -56.0
    lat_max: 56.0
    lon_min: -180.0
    lon_max: 180.0
  dimensions: ['latitude', 'longitude', 'time']
  s3_settings:
    connect_timeout_ms: "300000"
    request_timeout_ms: "600000"
    connect_max_tries: "10"
    multipart_part_size: "52428800"
    backoff_scale: "2.0"
    backoff_max_ms: "120000"
  cell_order: "hilbert"
  capacity: 100000

The configuration file contains:

  • Storage Type: Specifies s3 for cloud-based storage.

  • Time Range: Defines the global temporal coverage.

  • Spatial Range: Sets the global bounding box for latitude and longitude.

  • S3 Settings: Configures connection and request parameters for S3.

Note

The current database architecture is somewhat experimental, and different approaches may be more suitable to improve the speed of spatial and temporal queries. Users are encouraged to provide feedback and suggestions for optimizing the tileDB database configuration.

Data structure of the tileDB array

Figure 2: The data structure in the TileDB Global Database for GEDI Data.#

List of the available variables#

The database includes a wide range of variables, covering spatial coordinates, elevation data, vegetation metrics, biomass estimates, and quality flags across multiple GEDI products (L2A, L2B, L4A, L4C). Below is a table of available variables stored in the database:

Variable Descriptions#

Variable Name

Description

Units

Product

agbd

Aboveground biomass density

Mg/ha

L4A

agbd_pi_lower

Lower prediction interval for aboveground biomass density

Mg/ha

L4A

agbd_pi_upper

Upper prediction interval for aboveground biomass density

Mg/ha

L4A

agbd_se

Standard error of aboveground biomass density

Mg/ha

L4A

agbd_t

Model prediction in fit units

adimensional

L4A

agbd_t_se

Model prediction standard error in fit units

adimensional

L4A

algorithmrun_flag

The L2B algorithm run flag

adimensional

L2B

beam_name

Name of the beam

adimensional

L2A

beam_type

Type of beam used

adimensional

L2A

cover

Total canopy cover

Percent

L2B

cover_z

Cumulative canopy cover vertical profile

Percent

L2B

degrade_flag

Flag indicating degraded state of pointing and/or positioning information

adimensional

L2A

digital_elevation_model

TanDEM-X elevation at GEDI footprint location

Meters

L2A

digital_elevation_model_srtm

STRM elevation at GEDI footprint location

Meters

L2A

dz

Vertical step size of foliage profile

Meters

L2B

elev_highestreturn_a1

Elevation of the highest return detected using algorithm 1, relative to reference ellipsoid

Meters

L2A

elev_highestreturn_a2

Elevation of the highest return detected using algorithm 2, relative to reference ellipsoid

Meters

L2A

elev_lowestmode

Elevation of center of lowest mode relative to reference ellipsoid

Meters

L2A

energy_total

Total energy detected in the waveform

adimensional

L2A

fhd_normal

Foliage Height Diversity

adimensional

L2B

l2_quality_flag

Flag identifying the most useful L2 data for biomass predictions

adimensional

L4A

l2a_quality_flag

L2A quality flag

adimensional

L2B

l2b_quality_flag

L2B quality flag

adimensional

L2B

l4_quality_flag

Flag simplifying selection of most useful biomass predictions

adimensional

L4A

landsat_treecover

Tree cover in the year 2010, defined as canopy closure for all vegetation taller than 5 m in height as a percentage per output grid cell

Percent

L2A

landsat_water_persistence

Percent UMD GLAD Landsat observations with classified surface water

Percent

L2A

leaf_off_doy

GEDI 1 km EASE 2.0 grid leaf-off start day-of-year

adimensional

L2A

leaf_off_flag

GEDI 1 km EASE 2.0 grid flag

adimensional

L2A

leaf_on_cycle

Flag that indicates the vegetation growing cycle for leaf-on observations

adimensional

L2A

leaf_on_doy

GEDI 1 km EASE 2.0 grid leaf-on start day-of-year

adimensional

L2A

modis_nonvegetated

Percent non-vegetated from MODIS MOD44B V6 data

Percent

L2A

modis_nonvegetated_sd

Percent non-vegetated standard deviation from MODIS MOD44B V6 data

Percent

L2A

modis_treecover

Percent tree cover from MODIS MOD44B V6 data

Percent

L2A

modis_treecover_sd

Percent tree cover standard deviation from MODIS MOD44B V6 data

Percent

L2A

num_detectedmodes

Number of detected modes in rxwaveform

adimensional

L2A

omega

Foliage Clumping Index

adimensional

L2B

pai

Total Plant Area Index

m²/m²

L2B

pai_z

Plant Area Index profile

m²/m²

L2B

pavd_z

Plant Area Volume Density profile

m²/m³

L2B

pft_class

GEDI 1 km EASE 2.0 grid Plant Functional Type (PFT)

adimensional

L2A

pgap_theta

Total Gap Probability (theta)

adimensional

L2B

pgap_theta_error

Total Pgap (theta) error

adimensional

L2B

predict_stratum

Prediction stratum name for the 1 km cell

adimensional

L4A

predictor_limit_flag

Prediction stratum identifier (0=in bounds, 1=lower bound, 2=upper bound)

adimensional

L4A

quality_flag

Flag simplifying selection of most useful data

adimensional

L2A

region_class

GEDI 1 km EASE 2.0 grid world continental regions

adimensional

L2A

response_limit_flag

Prediction value outside bounds of training data (0=in bounds, 1=lower bound, 2=upper bound)

adimensional

L4A

rg

Integral of the ground component in the RX waveform

adimensional

L2B

rh

Relative height metrics at 1% interval

Meters

L2A

rh100

Height above ground of the received waveform signal start

cm

L2B

rhog

Volumetric scattering coefficient (rho) of the ground

adimensional

L2B

rhog_error

Rho (ground) error

adimensional

L2B

rhov

Volumetric scattering coefficient (rho) of the canopy

adimensional

L2B

rhov_error

Rho (canopy) error

adimensional

L2B

rossg

Ross-G function

adimensional

L2B

rv

Integral of the vegetation component in the RX waveform

adimensional

L2B

rx_algrunflag

Flag indicating signal was detected and algorithm ran successfully

adimensional

L2A

rx_maxamp

Maximum amplitude of rxwaveform relative to mean noise level

adimensional

L2A

rx_range_highestreturn

Range to signal start

Meters

L2B

sd_corrected

Noise standard deviation, corrected for odd/even digitizer bin errors based on pre-launch calibrations

adimensional

L2A

selected_algorithm

Identifier of algorithm selected as identifying the lowest non-noise mode

adimensional

L2A

selected_l2a_algorithm

Selected L2A algorithm setting

adimensional

L2B

selected_rg_algorithm

Selected R (ground) algorithm

adimensional

L2B

sensitivity

Maxmimum canopy cover that can be penetrated

adimensional

L2A

sensitivity_a1

Geolocation sensitivity factor A1

adimensional

L2A

sensitivity_a2

Geolocation sensitivity factor A2

adimensional

L2A

shot_number

Unique identifier for each shot

adimensional

L4C

solar_azimuth

Solar azimuth angle at the time of the shot

Degrees

L2A

solar_elevation

Solar elevation angle at the time of the shot

Degrees

L2A

stale_return_flag

Flag indicating return signal above detection threshold was not detected

adimensional

L2B

surface_flag

Identifier of algorithm selected as identifying the lowest non-noise mode

adimensional

L2A

toploc

Sample number of highest detected return

adimensional

L2A

urban_proportion

The percentage proportion of land area within a focal area surrounding each shot that is urban land cover.

Percent

L2A

wsci

Waveform Structural Complexity Index

adimensional

L4C

wsci_pi_lower

Waveform Structural Complexity Index lower prediction interval

adimensional

L4C

wsci_pi_upper

Waveform Structural Complexity Index upper prediction interval

adimensional

L4C

wsci_quality_flag

Waveform Structural Complexity Index quality flag

adimensional

L4C

wsci_xy

Horizontal Structural Complexity

adimensional

L4C

wsci_xy_pi_lower

Horizontal Structural Complexity lower prediction interval

adimensional

L4C

wsci_xy_pi_upper

Horizontal Structural Complexity upper prediction interval

adimensional

L4C

wsci_z

Vertical Structural Complexity

adimensional

L4C

wsci_z_pi_lower

Vertical Structural Complexity lower prediction interval

adimensional

L4C

wsci_z_pi_upper

Vertical Structural Complexity upper prediction interval

adimensional

L4C

zcross

Sample number of center of lowest mode above noise level

Nanoseconds

L2A

Accessing the database#

The gediDB Python package simplifies access to the TileDB global database. Below is an example workflow for querying data.

Example Code:

import geopandas as gpd
import gedidb as gdb

# Instantiate the GEDIProvider
provider = gdb.GEDIProvider(
    storage_type='s3',
    s3_bucket="dog.gedidb.gedi-l2-l4-v002",
    url="https://s3.gfz-potsdam.de"
)

# Load region of interest (ROI)
region_of_interest = gpd.read_file('path/to/test.geojson')

# Define variables to query and quality filters
vars_selected = ["agbd", 'rh']

# Query data
gedi_data = provider.get_data(
    variables=vars_selected,
    query_type="bounding_box",
    geometry=region_of_interest,
    start_time="2018-01-01",
    end_time="2024-07-25",
    return_type='xarray'
)

Explanation:

  • GEDIProvider: Initializes the provider with S3 storage details.

  • Region of Interest: Defines the geographic area for the query using a GeoJSON file.

  • Variables: Specifies the variables to extract (e.g., agbd, rh).

Examples and use cases#

Here are some example use cases:

  1. Retrieve Aboveground Biomass Density (AGBD) for a region:

    gedi_data = provider.get_data(
        variables=["agbd"],
        query_type="bounding_box",
        geometry=region_of_interest,
        start_time="2018-01-01",
        end_time="2024-07-25",
        return_type='xarray')
    
  2. Analyze Relative Heights (RH) with additional quality filters:

    gedi_data = provider.get_data(
        variables=["rh"],
        query_type="bounding_box",
        geometry=region_of_interest,
        start_time="2018-01-01",
        end_time="2024-07-25",
        quality_filters = {
                          'sensitivity': '>= 0.9 and <= 1.0',
                          'beam_type': "== 'full'"
                          },
        return_type='xarray')
    

Resources#