Layer Operations

Overview

A layer contains semantically related data of same type and format. Layers are subdivided into smaller units called partitions which are reasonably-sized units for caching and processing. Partitions can store any binary data - it will be published and retrieved without modification. However, you can optionally define the structure and encoding of the data by associating a content type and schema with the layer. The content type, such as application/geo+json, describes the encoding of the data stored in each partition according to internet standards. The schema, for content types that need one such as application/x-protobuf, defines the exact data structure for the partitions in the layer so that data producers and consumers know how to generate and interpret encoded data.

See the Data User Guide for more information on layers and partitions.

The HERE Data SDK for Python allows for interaction with these layer types:

  • Versioned layer
  • Volatile layer
  • Index layer
  • Stream layer
  • Interactive Map layer

Each layer type has its own usage and storage patterns, and separate classes with methods for that type. For information about all methods, see the layer module API documentation.

  • In the Read from Layer section following, you can find details on how to read data from each of the supported layer types.
  • For information on how to write data to each of these layer types, see the Write to Layer section.
  • For information on writing to multiple layers of a catalog simultaneously, see the Catalog Operations section.
  • For information on creating new and modifying existing layers, see Catalog Operations section.

Adapters

Adapters are used to encode and decode data and convert data between different representations, for example pandas DataFrame, geopandas GeoDataFrame, Python list and dict objects, GeoJSON objects of the geojson Python library. Additional adapters can be added in the future, or users can develop and use their own, to interface HERE Data SDK for Python to a variety of systems and content representations.

Included in the HERE Data SDK for Python there are the following adapters:

  • Default adapter, automatically used in case no other adapter is specified
  • GeoPandas Adapter

A default adapter is configured by default. Users can also specify a different adapter in various functions. However, not every adapter supports every content type and representation. It's always possible to specify the parameters encode=False or decode=False when reading or writing to disable encoding, decoding, and format transformation. In this case, data in the form of raw bytes is returned to the user when reading layers; similarly users have to provide already-encoded data when writing.

Supported formats

The following table summarizes the data formats currently supported natively by HERE Data SDK for Python, and via the use of additional data adapters.

Layer and content type encode=False decode=False DefaultAdapter GeoPandasAdapter
Versioned Layer
any read/write bytes
application/(x-)protobuf read Message read DataFrame
application/x-parquet read/write DataFrame
application/json read/write dict read/write DataFrame
application/(vnd.)geo+json read/write FeatureCollection read/write GeoDataFrame
text/csv read/write List[dict] read/write DataFrame
Volatile Layer
any read/write bytes
application/(x-)protobuf read Message read DataFrame
application/x-parquet read/write DataFrame
application/json read/write dict read/write DataFrame
application/(vnd.)geo+json read/write FeatureCollection read/write GeoDataFrame
text/csv read/write List[dict] read/write DataFrame
Index Layer
any read/write bytes
application/(x-)protobuf read Message read DataFrame
application/x-parquet read/write* DataFrame
application/json read/write dict read/write DataFrame
application/(vnd.)geo+json read/write FeatureCollection read/write GeoDataFrame
text/csv read/write List[dict] read/write DataFrame
Stream Layer
any read/write bytes
application/(x-)protobuf read Message read DataFrame
application/x-parquet read DataFrame
application/json read dict read DataFrame
application/(vnd.)geo+json read FeatureCollection read GeoDataFrame
text/csv read List[dict] read DataFrame
Interactive Map Layer
application/(vnd.)geo+json read/write FeatureCollection read/write GeoDataFrame

(*) writing is supported for only one partition at a time

Read from Layer

You can read partition metadata and the content of each partition (referred to as partition data), from the following layers:

  • Versioned
  • Volatile
  • Index
  • Stream

The functions read_partitions and read_stream return iterators, to consume one value at a time.

Interactive Map layer exposes an API based on GeoJSON features, without the concept of partitioning or content encoding/decoding.

To execute the read examples below, you would first need to run the following code that defines some example catalogs and layers:

from here.platform import Platform

platform = Platform()

oma_catalog = platform.get_catalog('hrn:here:data::olp-here:oma-3')
versioned_layer = oma_catalog.get_layer('topology_geometry_segment')

wx_catalog = platform.get_catalog('hrn:here:data::olp-here:live-weather-eu')
volatile_layer = wx_catalog.get_layer('latest-data')

sdii_catalog = platform.get_catalog('hrn:here:data::olp-here:olp-sdii-sample-berlin-2')
index_layer = sdii_catalog.get_layer('sample-index-layer')

traffic_catalog = platform.get_catalog('hrn:here:data::olp-here:olp-traffic-1')
stream_layer = traffic_catalog.get_layer('traffic-incidents-delta-stream')

test_catalog = platform.get_catalog(...)
interactive_map_layer = test_catalog.get_layer(...)

Read Partitions/Data from Versioned Layer

To read data stored in a versioned layer, please refer to the read_partitions function of the VersionedLayer class. Provided a set of partition identifiers, the function returns the data associated with the partitions specified. If no partitions are specified, the content for whole layer is returned.

It's possible to specify which version of the catalog to query, by default the latest.

While read_partitions queries layer metadata and downloads corresponding data associated with each partition in one single call, it's also possible to query just the partition metadata and obtain and decode the associated data manually, if needed at a later time. To query just the partition metadata, please use to the get_partitions_metadata function.

For additional information, an exhaustive list of parameters and adapter-specific parameters, please consult the documentation of VersionedLayer.

Example: reading two partitions

Data is decoded according to the content type specified in the layer configuration. The type of what is actually returned in partition_data depends on the content type and adapter used, according to the matrix above.

Each returned partition is of type VersionedPartition.

partitions = versioned_layer.read_partitions(partition_ids=[377893751, 377893752])

for partition, partition_data in partitions.items():
    print(partition.id, partition_data)

Example: reading a specific catalog version and skipping decoding

In this example, data is not decoded and partition_data is bytes.

partitions = versioned_layer.read_partitions(partition_ids=[377893751, 377893752], version=200, decode=False)

for partition, partition_data in partitions.items():
    print(partition.id, partition_data)

Example: obtaining the metadata and fetching the data manually

partitions = versioned_layer.get_partitions_metadata(partition_ids=[377893751, 377893752])

for partition in partitions:
    print(partition.id)
    data = partition.get_blob()
    print(data)

Read Partitions/Data from Volatile Layer

To read data stored in a volatile layer, please refer to the read_partitions function of the VolatileLayer class. Provided a set of partition identifiers, the function returns the data associated with the partitions specified. If no partitions are specified, the content for whole layer is returned.

While read_partitions queries layer metadata and downloads corresponding data associated with each partition in one single call, it's also possible to query just the partition metadata and obtain and decode the associated data manually, if needed at a later time. To query just the partition metadata, please use to the get_partitions_metadata function.

For additional information, exhaustive list of parameters and adapter-specific parameters, please consult the documentation of VolatileLayer.

Example: reading two partitions

Data is decoded according to the content type specified in the layer configuration. The type of what is actually returned in partition_data depends on the content type and adapter used, according to the matrix above.

Each returned partition is of type VolatilePartition.

partitions = volatile_layer.read_partitions(partition_ids=[377893751, 377893752])

for partition, partition_data in partitions.items():
    print(partition.id, partition_data)

Example: skipping decoding

In this example, data is not decoded and partition_data is bytes.

partitions = volatile_layer.read_partitions(partition_ids=[377893751, 377893752], decode=False)

for partition, partition_data in partitions.items():
    print(partition.id, partition_data)

Example: obtaining the metadata and fetching the data manually

partitions = volatile_layer.get_partitions_metadata(partition_ids=[377893751, 377893752])

for partition in partitions:
    print(partition.id)
    data = partition.get_blob()
    print(data)

Read Partitions/Data from Index Layer

To read data stored in an index layer, please refer to the read_partitions function of the IndexLayer class. The function returns all the data associated with index partitions matching a RSQL query.

While read_partitions queries via RSQL index metadata and downloads corresponding data associated with each partition in one single call, it's also possible to query via RSQL just the partition metadata and obtain and decode the associated data manually, if needed at a later time. To query just the partition metadata, please use to the get_partitions_metadata function.

For additional information, exhaustive list of parameters and adapter-specific parameters, please consult the documentation of IndexLayer.

Example: read all the data from selected partitions

Data is decoded according to the content type specified in the layer configuration. The type of what is actually returned in partition_data depends on the content type and adapter used, according to the matrix above.

Each returned partition is of type IndexPartition, and contains custom fields, as defined by the user who created the index layer.

partitions = index_layer.read_partitions(query="hour_from=ge=10")

for partition, partition_data in partitions:
    print(partition.fields, partition_data)

Example: read all the data from selected partitions skipping decoding

In this example, data is not decoded and partition_data is bytes.

partitions = index_layer.read_partitions(query="hour_from=ge=10", decode=False)

for partition, partition_data in partitions:
    print(partition.fields, partition_data)

Example: obtaining the metadata and fetching the data manually

partitions = index_layer.get_partitions_metadata(query="hour_from=ge=10")

for partition in partitions:
    print(partition.fields)
    data = partition.get_blob()
    print(data)

Read Partitions/Data from Stream Layer

To read from a stream layer, a subscription must be created first via the subscribe function of the StreamLayer class. The function instantiates on the HERE platform a Kafka consumer, that is later queried via its REST API to read messages from the layer. The subscribe function returns a StreamSubscription. Please unsubscribe when reading is complete to free resources on the platform.

To consume data stored in a stream layer, please refer to the read_stream function. The function consumes the stream and return the messages and corresponding content and requires a StreamSubscription.

While read_stream consumes the messages and downloads corresponding data associated with each message in one single call, it's also possible to consume and retrieve just the messages (partition metadata) and obtain and decode the associated data manually, if needed at a later time. To consume just the metadata, please use to the get_stream_metadata function. This function also requires a StreamSubscription.

For additional information, exhaustive list of parameters and adapter-specific parameters, please consult the documentation of StreamLayer.

Example: consuming content

Data is decoded according to the content type specified in the layer configuration. The type of what is actually returned in partition_data depends on the content type and adapter used, according to the matrix above.

Each returned partition is of type StreamPartition.

subscription = stream_layer.subscribe()

try:
    partitions = stream_layer.read_stream(subscription=subscription)

    for partition, partition_data in partitions.items():
        print(partition.id, partition.timestamp, partition_data)

finally:
    subscription.unsubscribe()

Example: skipping decoding

In this example, data is not decoded and partition_data is bytes.

subscription = stream_layer.subscribe()

try:
    partitions = stream_layer.read_stream(subscription=subscription, decode=False)

    for partition, partition_data in partitions.items():
        print(partition.id, partition.timestamp, partition_data)

finally:
    subscription.unsubscribe()

Example: obtaining the metadata and fetching the data manually

A distinguishing characteristics of the stream layer, compared for example to versioned and volatile layer, is that partition metadata (messages on the Kafka stream) can contain the data inline if small enough. The data may be included directly in each message, instead of being stored through the Blob API.

get_blob returns the data by retrieving it from the Blob API. get_data, specific to the stream layer, returns the data if inline and, only when needed, retrieves it from the Blob API. It is therefore recommended to use get_data.

subscription = stream_layer.subscribe()

try:
    partitions = stream_layer.get_stream_metadata(subscription=subscription)

    for partition in partitions:
        print(partition.id, partition.timestamp)
        data = partition.get_data()
        print(data)

finally:
    subscription.unsubscribe()

Example: consuming and producing content using direct Kafka

Direct Kafka support enables the users to retrieve instance of Kafka Consumer and Kafka Producer. Using these instances user can write and read in stream layer. These instances are configurable.

For more details on topic level consumer configuration https://kafka.apache.org/11/documentation.html#newconsumerconfigs.

For more details on topic level producer configuration https://kafka.apache.org/11/documentation.html#producerconfigs.

producer = stream_layer.kafka_producer(value_serializer=lambda x:json.dumps(x).encode('utf-8'))
topic = stream_layer.get_kafka_topic()
for x in range(10):
    data = {'x': x, '2x': x*2}
    producer.send(topic, value=data)
producer.close()

consumer = stream_layer.kafka_consumer(value_deserializer=lambda x:json.loads(x.decode('utf-8'))
for message in consumer:
    print(f"Message is {message.value}")
consumer.close()

Read Features from Interactive Map Layer

This layer type does not have the concept of partitions and encoded data. There are no functions that read raw data or support decode parameters. Interactive Map layers are designed around the concept of features, in the sense of GeoJSON FeatureCollection, instead of partitions.

When using the default adapter, FeatureCollection or iterator of Feature (both GeoJSON concepts) are returned directly.

Some of the functions to retrieve the feature from the layer:

  • get_features
  • search_features
  • list_features
  • get_features_in_bounding_box
  • spatial_search
  • spatial_search_geometry

Example: Read multiple features from an Interactive Map layer using get_features

features = interactive_map_layer.get_features(feature_ids=["feature_id1", "feature_id2", "feature_id3"])

Example: Search for and retrieve features based on properties using search_features

features = interactive_map_layer.search_features(params={"p.name": "foo", "p.type": "bar"})

Example: Retrieve all features in a layer using list_features

for feature in interactive_map_layer.list_features():
    print(feature)

Example: Find and retrieve all features in a bounding box using get_features_by_bbox

bbox_features = interactive_map_layer.get_features_in_bounding_box(bounds=(-171.791110603, 18.91619, -66.96466, 71.3577635769))

Example: Find and retrieve all features within given radius of input point using spatial_search

features = interactive_map_layer.spatial_search(lng=-95.95417, lat=41.6065, radius=1000)

Example: Find and retrieve all features within arbitrary geometry using spatial_search_geometry

from shapely.geometry import Polygon

polygon = Polygon([(0, 0), (1, 1), (1, 0)])
features = interactive_map_layer.spatial_search_geometry(geometry=polygon)

Write to Layer

Below examples show how to publish data to each supported layer type.

The functions write_partitions and write_stream accept iterators, to produce one value at a time. In case an Adapter is used, the type to pass to the write functions is adapter-specific.

In the examples below, actual data is represented by the placeholder ... while partition identifiers are plausible.

Write to Versioned Layer

To write to one or more versioned layers, a Publication must be first created. Publications for versioned layers works like transactions. It's possible to complete a publication or cancel it to drop all the metadata uploaded until that moment.

Once a Publication is available, one or more write_partitions function calls can be used to write data to a layer. Each write function is layer specific. It's possible call each write function more than once for the same or multiple layers. See write_partitions for additional details.

Use set_partitions_metadata to update and delete metadata of partitions of a layer without uploading the content at the same time. The content has to be uploaded separately before. The function also provide a way to delete partitions as part of a publication.

Completing a publication involving one or more versioned layers creates a new version of the catalog. Please see init_publication for additional details.

Example: writing multiple layers and partitions

The following snippet creates a new catalog version in which partitions of multiple layers are added or modified. The content of each partition is encoded and uploaded. The transaction is committed when the Publication is complete.

Data is encoded according to the content type set in the layer configuration.

layerA = catalog.get_layer("A")
layerB = catalog.get_layer("B")

publication = catalog.init_publication(layers=[layerA, layerB])  # ["A", "B"] also accepted

try:
    layerA.write_partitions(publication, { "a1": ..., "a2": ..., "a3": ... })
    layerB.write_partitions(publication, { 377893751: ..., 377893752: ... })
    layerB.write_partitions(publication, { 377893753: ..., 377893754: ... })
    publication.complete()

except:
    publication.cancel()

Example: writing multiple layers and partitions skipping encoding

Users can provide already-encoded data in the form of bytes compatible with the content type configured in each layer. In this case, encode=False should be specified.

layerA = catalog.get_layer("A")
layerB = catalog.get_layer("B")

publication = catalog.init_publication(layers=[layerA, layerB])  # ["A", "B"] also accepted

try:
    layerA.write_partitions(publication, { "a1": bytes(...), "a2": bytes(...) }, encode=False)
    layerB.write_partitions(publication, { 377893751: bytes(...), 377893752: bytes(...) }, encode=False)
    publication.complete()

except:
    publication.cancel()

Write to Volatile Layer

To write to one or more volatile layers, a Publication must be first created. Publications for volatile layers should be closed when not needed anymore via the complete function to free resources. The cancel function, due to the nature of volatile layer, has no effect as the layer is not versioned, it doesn't support transactions and succeeded writes cannot be rolled back.

Once a Publication is available, one or more write_partitions function calls can be used to write data to a layer. Each write function is layer specific. It's possible call each write function more than once for the same or multiple layers. See write_partitions for additional details.

Use set_partitions_metadata to update and delete metadata of partitions of a layer without uploading the content at the same time. The content has to be uploaded separately before. The function also provide a way to delete partitions. Another way to delete partitions is delete_partitions.

Example: writing multiple layers and partitions

The following snippet write to two volatile layers. The content of each partition is encoded and uploaded.

Data is encoded according to the content type set in the layer configuration.

layerA = catalog.get_layer("A")
layerB = catalog.get_layer("B")

publication = catalog.init_publication(layers=[layerA, layerB])  # ["A", "B"] also accepted

try:
    layerA.write_partitions(publication, { "a1": ..., "a2": ..., "a3": ... })
    layerB.write_partitions(publication, { 377893751: ..., 377893752: ... })

finally:
    publication.complete()

Example: writing multiple layers and partitions skipping encoding

Users can provide already-encoded data in the form of bytes compatible with the content type configured in each layer. In this case, encode=False should be specified.

layerA = catalog.get_layer("A")
layerB = catalog.get_layer("B")

publication = catalog.init_publication(layers=[layerA, layerB])  # ["A", "B"] also accepted

try:
    layerA.write_partitions(publication, { "a1": bytes(...), "a2": bytes(...) }, encode=False)
    layerB.write_partitions(publication, { 377893751: bytes(...), 377893752: bytes(...) }, encode=False)

finally:
    publication.complete()

Write to Index Layer

Writing to index layer does not need a publication. Writing to index layer is currently supported for one partition at the time.

Use the function write_single_partition to add a data and corresponding metadata to the index layer. Use the functino delete_partitions to remove partitions of the index layer that match a RSQL query.

It is also possible to operate with the index layer metadata only via the set_partitions_metadata. This adds and delete index partitions at once. The content for the added partitions has to be uploaded separately before.

Example: adding one partition to a layer

Data is encoded according to the content type set in the layer configuration.

fields = {
    "f1": 100,
    "f2": 500
}

index_layer.write_single_partition(data=..., fields=fields)

Example: adding one partition to a layer skipping encoding

Users can provide already-encoded data in the form of bytes compatible with the content type configured in each layer. In this case, encode=False should be specified.

fields = {
    "f1": 100,
    "f2": 500
}

index_layer.write_single_partition(data=bytes(...), fields=fields, encode=False)

Write to Stream Layer

To write to one or more stream layers, a Publication must be first created. Publications for stream layers should be closed when not needed anymore via the complete function to free resources. The cancel function, due to the nature of stream layer, has no effect as it is not possible to delete messages from a Kafka stream once written.

Once a Publication is available, one or more write_stream function calls can be used to write data to a layer. Each write function is layer specific. It's possible call each write function more than once for the same or multiple layers. See write_stream for additional details.

Use append_stream_metadata to write to the stream just the metadata (messages) of a layer without uploading the content at the same time. The content has to be uploaded separately before, or included in the data field of the messages when small enough.

Example: writing to streams

The content of each partition is encoded and uploaded.

Data is encoded according to the content type set in the layer configuration.

layerA = catalog.get_layer("A")
layerB = catalog.get_layer("B")

publication = catalog.init_publication(layers=[layerA, layerB])  # ["A", "B"] also accepted

try:
    layerA.write_stream(publication, { "a1": ..., "a2": ..., "a3": ... })
    layerB.write_stream(publication, { 377893751: ..., 377893752: ... })

finally:
    publication.complete()

Example: writing to streams skipping encoding

Users can provide already-encoded data in the form of bytes compatible with the content type configured in each layer. In this case, encode=False should be specified.

layerA = catalog.get_layer("A")
layerB = catalog.get_layer("B")

publication = catalog.init_publication(layers=[layerA, layerB])  # ["A", "B"] also accepted

try:
    layerA.write_stream(publication, { "a1": bytes(...), "a2": bytes(...) }, encode=False)
    layerB.write_stream(publication, { 377893751: bytes(...), 377893752: bytes(...) }, encode=False)

finally:
    publication.complete()

Write to Interactive Map Layer

This layer type does not have the concept of partitions and data. There are no functions that write raw data or support encode parameters. Interactive Map Layer API is modeled on the concept of GeoJSON FeatureCollection. Consult the API documentation for full details.

When using the default adapter, FeatureCollection or iterator of Feature (both GeoJSON concepts) are passed directly as parameters.

Example: writing GeoJSON features

from geojson import FeatureCollection, Feature, Point, Polygon

f1 = Polygon(coordinates=[(0, 0), (0, 1), (1, 0), (0, 0)], properties={"a": 100, "b": 200})
f2 = Point(coordinates=(-1.5, 2.32), properties={"a": 50, "b": 95})
features = FeatureCollection(features=[f1, f2])

interactive_map_layer.write_features(features)

Example: writing GeoJSON features form a file

geojson_file_path = "~/example.geojson"

interactive_map_layer.write_features(from_file=geojson_file_path)

results matching ""

    No results matching ""