HERE Map Content (HMC) Attribute Retriever

Overview

HERE Map Content (HMC) Attribute Retriever is designed to enrich the user's data with chosen attributes from the HERE Map Content catalog. It accepts data in GeoJSON format with "Point" geometry type as an input. Currently, this customizable pipeline template is designed to read data from the Volatile layer only. When executing, users can choose which HMC attributes to pull from the catalog and add to their data.

For each provided GeoJSON feature, this pipeline template will grab Point coordinate, map match it to the HMC topology in order to know which segment this feature belongs to, and pull requested attributes. The output will be written to a versioned layer in the GeoJSON format where all original properties of the features are preserved and requested HMC attributes are added on top of initial data.

Current supported attributes include:

  • freeFlowSpeed
  • functionalClass
  • isoCountryCodeAttribute
  • overpassUnderpass
  • physicalAttribute
  • roadAccessType
  • roadClass
  • segmentGeometry
  • segmentLength
  • specialTrafficAreaCategory
  • speedLimit

      Data coverage provided for these attributes is subject to HMC specifications.
    

Regardless of which features have been requested by the user, this pipeline template will add the following information:

  • tileId: HERE Tile id to which the GeoJSON feature belongs
  • segmentUri: for example: "here:cm:segment:155520884" - segment id in HMC topology (read more about topology-geometry model in HMC)
  • offset, for example: 0.78 - position of the feature on the segment (from the beginning of the segment)
  • hmcRefTileId: HERE Tile id where the segment starts (can be different from tileId)

Structure

HERE Map Content (HMC) Attribute Retriever operates as a batch pipeline that reads input data from a volatile layer and publishes the result to a versioned layer. All original information about each GeoJSON feature will be preserved and requested attributes will be added to the "properties" section of the GeoJSON feature.

Application Flow Diagram
Figure 1. Application Flow Diagram

Legend: Legend Diagram

Example of expected input format
{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "properties": {
      ...
      },
      "geometry": {
        "type": "Point",
        "coordinates": [
          11.789017,
          48.054792
        ]
      }
    }
]
}
Example of output data
{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "properties": {
      ...,
      tileId: 23611420,
      hmcRefTileId: 23611420,
      segmentUri: "here:cm:segment:140151132",
      offset: 0.6491530936803867,
      functionalClass: 5,
      speedLimit: 100
      },
      "geometry": {
        "type": "Point",
        "coordinates": [
          11.789017,
          48.054792
        ]
      }
    }
]
}

Prerequisites

  • This pipeline template expects the user to provide an existing catalog with data published to a volatile layer as an input to the pipeline. Please make sure that you have your input data in GeoJSON format (FeatureCollection) published to a volatile layer and that the "RETENTION" setting on the volatile layer won't let your data expire by the time this pipeline template will be executed. Refer to the Create Input Catalog and Layer section below for more info.
  • Make sure that your input catalog is shared with the GROUP that you are going to use for deployment of this pipeline template.
  • Confirm that your local credentials (~/.here/credentials.properties) are added to the same group.
  • If you are planning to extract any HMC attributes that belong to a "premium" layer, make sure your credentials have access.
Create Input Catalog and Layer

This example requires you to have a catalog with a volatile layer for the input data. For instructions on how to create a catalog, please refer to Create a Catalog. For instructions on how to create a layer, please refer to Create a Layer.

HERE platform pipelines are managed by group. Therefore, please grant read access to your group id so your pipeline can read from the input catalog. For instructions on how to manage groups, please refer to Manage Groups in the Teams and Permissions User Guide. For instructions on how to share your catalog, please refer to Share a Catalog.

For instructions on how to publish your input data into a Volatile layer, please see the links below. The easiest option would be to use CLI.

Execution

Running on the HERE platform

In order to deploy and run this pipeline template, you will need the Wizard Deployer. The Wizard executes interactively, asking questions about the application, and expects the user to provide needed answers. Follow the Wizard's documentation instructions and set up the needed parameters, then follow these steps:

  1. Execute the script as ./wizard.sh
  2. Follow the prompts and provide needed answers

You can use your existing output layer or let the Wizard create a new catalog/layer for you. If using an existing catalog, make sure it is shared with the GROUP_ID that will be used for this deployment.

PLEASE NOTE: In order to process your data in a reasonable time frame, you may need to tune the amount of cores needed for processing. The factors contributing to this decision are:

  • Amount and geographical density of GeoJSON features in your data: In order to pull requested attributes, each feature needs to be map matched to HMC topology. Not only the amount of features matters, but their location too. The same amount of features may result in different processing times, depending on their placement: Features that are placed densely and fall into fewer HMC tiles will require fewer network calls to map match, resulting in faster processing.
  • Number of attributes requested: Attributes are stored in different layers, which means more network calls will be required to fetch needed data from the HERE platform, and therefore processing time will increase.

You can start with a default configuration of one core and use the Spark Web UI to monitor memory utilization and data distribution in your running pipeline. You should also monitor Splunk logs for errors and exceptions, for instance, the OutOfMemoryError exception most likely indicates that more processing power is needed for the submitted amount of input data. For the current version when using the Wizard for deployment of this pipeilne template, each worker will have 1 CPU, 7 GB RAM, 8 GB Disk Space.

Verification

In the Platform Portal select the Pipelines tab where you should be able to see your pipeline deployed and running. After your pipeline finishes and your data is published, you can find your output catalog under Data tab or query/retrieve your data programmatically using one of the following options:

Cost Estimation

Executing this pipeline template will incur the following costs:

Storage-Volatile

Cost will depend on the settings of your input Volatile layer.

Storage-Blob

Cost will depend on the amount of data that will be published to a Versioned layer as an output from execution.

Data Transfer IO

Cost will depend on amount of:

  • input data published to a Volatile layer (published before execution of this pipeline template)
  • same input data retrieved from Volatile layer (during pipeline template execution)
  • amount of data written out to a Versioned layer
  • amount of premium map data read from premium HMC layers, in case you are requesting "premium" attributes
  • amount of data read from Optimized Map for Location Library for the needs of map matching
Metadata

Cost will depend on the amount and size of partitions(metadata) stored in the Versioned layer.

Compute Core and Compute RAM

Cost will depend on the amount of data that needs to be processed. More data will require more processing power and will take longer to finish.

Log Search IO

Cost will depend on the log level set for the execution of this pipeline template. To minimize this cost, the user can set log level to WARN.

Data

Cost will incur if the user selects attributes that belong to premium HMC layers. Cost will depend on the amount of data pulled from premium layers.

Support

If you need support with this pipeline template, please contact us.

results matching ""

    No results matching ""