Using Pandas and GeoPandas

A GeoPandasAdapter is provided in package here-geopandas-adapter to ease working with Pandas and GeoPandas library. Once imported, instantiated and enabled in the Platform, many read and write functions of the HERE Data SDK for Python accept and return pd.DataFrame, pd.GeoSeries, gpd.GeoDataFrame and gpd.GeoSeries in place of Python list and dict objects.

Enabling the Adapter

To use the HERE GeoPandas Adapter, initialize a Platform object with a new GeoPandasAdapter as shown below:

from here.platform import Platform
from here.geopandas_adapter import GeoPandasAdapter

platform = Platform(adapter=GeoPandasAdapter())

The adapter applies to all the catalogs and other entities created through that Platform object.

It's also possible to enable the adapter only for selected catalogs, specifying it in the corresponding get_catalog call:

from here.platform import Platform
from here.geopandas_adapter import GeoPandasAdapter

platform = Platform()
adapter = GeoPandasAdapter()

# These catalogs use the adapter
weather_eu = platform.get_catalog('hrn:here:data::olp-here:live-weather-eu', adapter=adapter)
weather_na = platform.get_catalog('hrn:here:data::olp-here:live-weather-na', adapter=adapter)

# This catalog does not
sdii = platform.get_catalog("hrn:here:data::olp-here:olp-sdii-sample-berlin-2")

Lastly, it's also possible to specify the use of the adapter in single functions:

from here.platform import Platform
from here.geopandas_adapter import GeoPandasAdapter

platform = Platform()
adapter = GeoPandasAdapter()

weather_na = platform.get_catalog('hrn:here:data::olp-here:live-weather-na')
live_layer = weather_na.get_layer('latest-data')

# This function uses the adapter
weather_df = live_layer.read_partitions([75477, 75648, 75391, 75562], adapter=adapter)

# This function does not
weather_msgs = live_layer.read_partitions([75477, 75648, 75391, 75562])

Read to DataFrame

To read data and metadata from versioned, volatile, index and stream layers, please familiarize first with the read functions described in the corresponding section of this user guide.

All the standard parameters of get_partitions_metadata, read_partitions, read_stream_metadata, read_stream are supported, in addition to adapter-specific parameters that are forwarded to this adapter and its data decoder.

When reading and decoding data, parameters that are adapter-specific are passed to the pd.read_csv, pd.read_parquet and similar functions that perform the actual decoding of each single partition. You can use them to fine-tune the details of the decoding of single partitions, including how to handle the (Geo)DataFrame index, if present in the data. The GeoPandasAdapter puts together the output in a single, resulting dataframe. For more information of supported content types and exact parameters, please see the documentation of GeoPandasDecoder.

In case decode=False is passed to read_partitions or read_stream, no decoding takes places, the adapter is not used and a plain Python collection containing bytes is returned.

Get Partitions Data and Metadata from Versioned Layer in a DataFrame

Use get_partitions_metadata to obtain partitions metadata. When the GeoPandasAdapter is enabled, a pd.DataFrame is returned instead of a list or dict as shown in the example below.

Example: getting versioned metadata in a DataFrame

from here.platform import Platform
from here.geopandas_adapter import GeoPandasAdapter

platform = Platform(adapter=GeoPandasAdapter())
sdii_catalog = platform.get_catalog("hrn:here:data::olp-here:olp-sdii-sample-berlin-2")
versioned_layer = sdii_catalog.get_layer("sample-versioned-layer")

partitions_df = versioned_layer.get_partitions_metadata(partition_ids=[377893751])

# In Jupyter, this prints the first rows of the result
partitions_df.head()

Partitions metadata are returned in a DataFrame that is not indexed.

id data_handle checksum data_size crc
0 377893751 17b42bde-02c3-461e-a290-3c487cd316cf 8813

Use read_partitions to fetch and decode the data. When the GeoPandasAdapter is enabled, a pd.DataFrame or a gpd.GeoDataFrame, depending on the content, is returned instead of a list or dict as shown in the example below.

Example: reading versioned data in a DataFrame

partitions_df = versioned_layer.read_partitions(partition_ids=[377893751])

# In Jupyter, this prints the first rows of the result
partitions_df.head()

Partitions data are returned in a DataFrame that is not indexed. Only one pd.DataFrame or gpd.GeoDataFrame is returned. Data of multiple partitions are all included in the same output. A partition_id column is added to disambiguate. The name of the columns depends on the content type, schema and actual content of the layer.

partition_id messageId message metadata
0 377893751 c6d07e2e-1b6d-48c4-b544-383d3956d742 {'envelope': {'versi {'receivedTime': 1507151512426}
1 377893751 77a3f90f-2327-4395-a0d3-dcd84fbccb45 {'envelope': {'versi {'receivedTime': 1507151512447}
2 377893751 dfeeb878-20bd-4b3e-b7bf-39de7a18ca9d {'envelope': {'versi {'receivedTime': 1507151512447}
3 377893751 5dc72402-1c43-47a3-b896-3035b7bcab85 {'envelope': {'versi {'receivedTime': 1507151512448}
4 377893751 fefe0784-0770-484e-b4fa-6dbc2b4f4685 {'envelope': {'versi {'receivedTime': 1507151512448}

(some text truncated for clarity)

Get Partitions Data and Metadata from Volatile Layer in a DataFrame

Use get_partitions_metadata to obtain partitions metadata. When the GeoPandasAdapter is enabled, a pd.DataFrame is returned instead of a list or dict as shown in the example below.

Example: getting volatile metadata in a DataFrame

from here.platform import Platform
from here.geopandas_adapter import GeoPandasAdapter

platform = Platform(adapter=GeoPandasAdapter())
weather_catalog = platform.get_catalog('hrn:here:data::olp-here:live-weather-na')
volatile_layer = weather_catalog.get_layer('latest-data')

partitions_df = volatile_layer.get_partitions_metadata(partition_ids=[81150])

# In Jupyter, this prints the first rows of the result
partitions_df.head()

Partitions metadata are returned in a DataFrame that is not indexed.

id data_handle checksum data_size crc
0 81150 81150

Use read_partitions to fetch and decode the data. When the GeoPandasAdapter is enabled, a pd.DataFrame or a gpd.GeoDataFrame, depending on the content, is returned instead of a list or dict as shown in the example below.

Example: reading volatile data in a DataFrame

partitions_df = volatile_layer.read_partitions(partition_ids=[81150])

# In Jupyter, this prints the first rows of the result
partitions_df.head()

Partitions data are returned in a DataFrame that is not indexed. Only one pd.DataFrame or gpd.GeoDataFrame is returned. Data of multiple partitions are all included in the same output. A partition_id column is added to disambiguate. The name of the columns depends on the content type, schema and actual content of the layer.

partition_id tile_id tile_level center_point_geohash timestamp air_temperature dew_point_temperature humidity air_pressure visibility iop pavement_temperature wfd wind_velocity precipitation_type pavement_type
0 81150 332391765 14 g7ybpf00 1623861204152 {'value': 4.62, 'confidence': 0.17, 'confidence_category_type': 3} {'value': -1.22, 'confidence': 0.17, 'confidence_category_type': 3} {'value': 67.74, 'confidence': 0.17, 'confidence_category_type': 3} {'value': 1009.51, 'confidence': 0.17, 'confidence_category_type': 3} {'value': 9.99, 'confidence': 0.17, 'confidence_category_type': 3} {'confidence': 0.5, 'confidence_category_type': 2} nan nan {'value': 28.03, 'direction': 26.27, 'confidence': 0.17, 'confidence_category_type': 3} {'precipitation_type': 1, 'confidence': 0.5, 'confidence_category_type': 2} nan
1 81150 332391767 14 g7ybpy00 1623861204152 {'value': 4.6, 'confidence': 0.17, 'confidence_category_type': 3} {'value': -1.2, 'confidence': 0.17, 'confidence_category_type': 3} {'value': 67.94, 'confidence': 0.17, 'confidence_category_type': 3} {'value': 1009.52, 'confidence': 0.17, 'confidence_category_type': 3} {'value': 9.99, 'confidence': 0.17, 'confidence_category_type': 3} {'confidence': 0.5, 'confidence_category_type': 2} nan nan {'value': 28.05, 'direction': 26.33, 'confidence': 0.17, 'confidence_category_type': 3} {'precipitation_type': 1, 'confidence': 0.5, 'confidence_category_type': 2} nan
2 81150 332391760 14 g7ybn600 1623861204152 {'value': 4.67, 'confidence': 0.16, 'confidence_category_type': 3} {'value': -1.25, 'confidence': 0.16, 'confidence_category_type': 3} {'value': 67.34, 'confidence': 0.16, 'confidence_category_type': 3} {'value': 1009.5, 'confidence': 0.16, 'confidence_category_type': 3} {'value': 9.99, 'confidence': 0.16, 'confidence_category_type': 3} {'confidence': 0.5, 'confidence_category_type': 2} nan nan {'value': 27.98, 'direction': 26.14, 'confidence': 0.16, 'confidence_category_type': 3} {'precipitation_type': 1, 'confidence': 0.5, 'confidence_category_type': 2} nan
3 81150 332391761 14 g7ybnf00 1623861204152 {'value': 4.66, 'confidence': 0.16, 'confidence_category_type': 3} {'value': -1.24, 'confidence': 0.16, 'confidence_category_type': 3} {'value': 67.47, 'confidence': 0.16, 'confidence_category_type': 3} {'value': 1009.5, 'confidence': 0.16, 'confidence_category_type': 3} {'value': 9.99, 'confidence': 0.16, 'confidence_category_type': 3} {'confidence': 0.5, 'confidence_category_type': 2} nan nan {'value': 28.0, 'direction': 26.18, 'confidence': 0.16, 'confidence_category_type': 3} {'precipitation_type': 1, 'confidence': 0.5, 'confidence_category_type': 2} nan
4 81150 332391764 14 g7ybp600 1623861204152 {'value': 4.64, 'confidence': 0.17, 'confidence_category_type': 3} {'value': -1.23, 'confidence': 0.17, 'confidence_category_type': 3} {'value': 67.6, 'confidence': 0.17, 'confidence_category_type': 3} {'value': 1009.51, 'confidence': 0.17, 'confidence_category_type': 3} {'value': 9.99, 'confidence': 0.17, 'confidence_category_type': 3} {'confidence': 0.5, 'confidence_category_type': 2} nan nan {'value': 28.01, 'direction': 26.22, 'confidence': 0.17, 'confidence_category_type': 3} {'precipitation_type': 1, 'confidence': 0.5, 'confidence_category_type': 2} nan

(some text truncated for clarity)

Get Partitions Data and Metadata from Index Layer in a DataFrame

Use get_partitions_metadata to obtain partitions metadata. When the GeoPandasAdapter is enabled, a pd.DataFrame is returned instead of a list or dict as shown in the example below.

Example: getting index metadata in a DataFrame

from here.platform import Platform
from here.geopandas_adapter import GeoPandasAdapter

platform = Platform(adapter=GeoPandasAdapter())
sdii_catalog = platform.get_catalog("hrn:here:data::olp-here:olp-sdii-sample-berlin-2")
index_layer = sdii_catalog.get_layer("sample-index-layer")

partitions_df = index_layer.get_partitions_metadata(query="hour_from=ge=10")

# In Jupyter, this prints the first rows of the result
partitions_df.head()

Partitions metadata are returned in a DataFrame that is not indexed. The data handle is used in place of partition id, since the index layer doesn't have a proper identifier for partitions.

id data_handle checksum data_size crc
0 1d63cfb6-5b79-455a-8fda-1503b99253e3 1d63cfb6-5b79-455a-8fda-1503b99253e3 0353f45622ac843ccabbc8af4ce6739d5baf171a 290391
1 1f9c8d0a-2519-4cd8-af4a-0fd0fa16b047 1f9c8d0a-2519-4cd8-af4a-0fd0fa16b047 1a1472a4de647291da7498407b59a2011af6c25c 113261
2 2f9c978d-b6bc-4889-b7d4-a47849fb6a17 2f9c978d-b6bc-4889-b7d4-a47849fb6a17 74b94f931c3bda3a7500eadaf34506445c0a10ba 356674
3 2fed9456-7275-4786-b600-0c4865854b79 2fed9456-7275-4786-b600-0c4865854b79 ad68c63881bfeae3635d64270df4e13202049f54 115175
4 3b0c053b-8988-4621-92d7-9daf65e7d4a7 3b0c053b-8988-4621-92d7-9daf65e7d4a7 e7aca6afb0a37ed46d9e11a8c2ed73afa9eae1d0 114945

Use read_partitions to fetch and decode the data. When the GeoPandasAdapter is enabled, a pd.DataFrame or a gpd.GeoDataFrame, depending on the content, is returned instead of a list or dict as shown in the example below.

Example: reading index data in a DataFrame

partitions_df = index_layer.read_partitions(query="hour_from=ge=10")

# In Jupyter, this prints the first rows of the result
partitions_df.head()

Partitions data are returned in a DataFrame that is not indexed. Only one pd.DataFrame or gpd.GeoDataFrame is returned. Data of multiple partitions are all included in the same output. A partition_id column is added to disambiguate. The name of the columns depends on the content type, schema and actual content of the layer. The data handle is used in place of partition id, since the index layer doesn't have a proper identifier for partitions.

partition_id envelope path pathEvents pathMedia
0 1d63cfb6-5b79-455a-8fda-1503b99253e3 {'version': '1.0', 'submitter' {'positionEstimate': array([{' {'vehicleStatus': None, 'vehic None
1 1d63cfb6-5b79-455a-8fda-1503b99253e3 {'version': '1.0', 'submitter' {'positionEstimate': array([{' {'vehicleStatus': None, 'vehic None
2 1d63cfb6-5b79-455a-8fda-1503b99253e3 {'version': '1.0', 'submitter' {'positionEstimate': array([{' {'vehicleStatus': None, 'vehic None
3 1d63cfb6-5b79-455a-8fda-1503b99253e3 {'version': '1.0', 'submitter' {'positionEstimate': array([{' {'vehicleStatus': None, 'vehic None
4 1d63cfb6-5b79-455a-8fda-1503b99253e3 {'version': '1.0', 'submitter' {'positionEstimate': array([{' {'vehicleStatus': None, 'vehic None

(some text truncated for clarity)

Get Partitions Data and Metadata from Stream Layer in a DataFrame

Use get_stream_metadata to consume partitions metadata from a stream subscription. When the GeoPandasAdapter is enabled, a pd.DataFrame is returned instead of a list or dict as shown in the example below.

Example: getting stream metadata in a DataFrame

from here.platform import Platform
from here.geopandas_adapter import GeoPandasAdapter

platform = Platform(adapter=GeoPandasAdapter())
sdii_catalog = platform.get_catalog("hrn:here:data::olp-here:olp-sdii-sample-berlin-2")
stream_layer = sdii_catalog.get_layer("sample-streaming-layer")
subscription = stream_layer.subscribe()

partitions_df = stream_layer.get_stream_metadata(subscription=subscription)

# In Jupyter, this prints the first rows of the result
partitions_df.head()

Partitions metadata (stream messages) are returned in a DataFrame that is not indexed.

id data_handle checksum data_size crc
0 377893751 17b42bde-02c3-461e-a290-3c487cd316cf 8813

Use read_stream to consume, fetch and decode the data from a stream subscription. When the GeoPandasAdapter is enabled, a pd.DataFrame or a gpd.GeoDataFrame, depending on the content, is returned instead of a list or dict as shown in the example below.

Example: reading stream data in a DataFrame

In this example we show how adapter-specific parameters, such as paths, can be used to customize the decoding. We're interested in only a selection of the properties of the data.

paths = [ "path.positionEstimate.timeStampUTC_ms",
          "path.positionEstimate.latitude_deg",
          "path.positionEstimate.longitude_deg",
          "path.positionEstimate.heading_deg",
          "path.positionEstimate.speed_mps"
        ]

partitions_df = stream_layer.read_stream(subscription=subscription, paths=paths)

# In Jupyter, this prints the first rows of the result
partitions_df.head()

Partitions data are returned in a DataFrame that is not indexed. Only one pd.DataFrame or gpd.GeoDataFrame is returned. Data of multiple partitions are all included in the same output. A partition_id column is added to disambiguate. The name of the columns depends on the content type, schema and actual content of the layer.

partition_id timeStampUTC_ms latitude_deg longitude_deg heading_deg speed_mps
0 388cdc55-c78d-487d-935b-4275c20ace6d 1623857370779 52.5313 13.3614 320.648 16
1 388cdc55-c78d-487d-935b-4275c20ace6d 1623857372779 52.5315 13.3612 320.648 16
2 388cdc55-c78d-487d-935b-4275c20ace6d 1623857374779 52.5319 13.361 320.648 16
3 388cdc55-c78d-487d-935b-4275c20ace6d 1623857376779 52.5321 13.3607 320.648 16
4 388cdc55-c78d-487d-935b-4275c20ace6d 1623857378779 52.5324 13.3605 320.648 16

(some text truncated for clarity)

Write DataFrame to Layer

To write data and metadata to versioned, volatile, index and stream layers, please familiarize first with the write functions described in the corresponding section of this user guide.

For content types supported by the GeoPandas Adapter (see Table) contents of a DataFrame or GeoDataFrame can be encoded and written to layer with a single function. For content types not supported, you will need to take care of the encoding yourself and pass encode=False.

All the standard parameters of set_partitions_metadata, write_partitions, append_stream_metadata, write_stream are supported, in addition to adapter-specific parameters that are forwarded to this adapter and its data encoder.

When writing and encoding data, the GeoPandasAdapter splits the (Geo)DataFrame to write in partitions according to the partition_id column. Each selection of rows is then encoded and stored as standalone partition. Rows with no partition identifier set are discarded. Parameters that are adapter-specific are passed to the DataFrame.to_csv, DataFrame.to_parquet and similar functions that perform the actual encoding of each single partition. You can use them to fine-tune the details of the encoding of single partitions, including how to handle the (Geo)DataFrame index. For more information of supported content types and exact parameters, please see the documentation of GeoPandasEncoder.

In case encode=False is passed to write_partitions or write_stream, a plain Python collection containing bytes and not a (Geo)DataFrame must be passed as well, as the adapter is not used and no encoding takes place.

Write examples are symmetric to the read examples shown above.

results matching ""

    No results matching ""