HERE Map Content (HMC) Topology Filter

Overview

HERE Map Content (HMC) Topology Filter is a batch pipeline designed to easily extract topology segments data and their related attributes from the HERE Map Content catalog based on the attribute filters provided by the user. One example would be to find all topology segments that belong to the roads of specific functional class, and/or have a certain number of lanes. Please see the FILTERS.md file for detailed information on supported attribute filters. Before deployment of this pipeline template, the user will need to provide filter criteria in the filter.properties file in config folder. One or multiple filters can be provided at a time. If two or more filters are provided, they will be treated using AND logic.

Consider the following example, where the user provides the following filters:

# Attribute Filter #1:
here.platform.data-processing.compiler.attributeFilters.0.className=FunctionalClassAttribute
here.platform.data-processing.compiler.attributeFilters.0.methodName=functionalClass
here.platform.data-processing.compiler.attributeFilters.0.attributeCriteria=equals
here.platform.data-processing.compiler.attributeFilters.0.attributeValue=FUNCTIONAL_CLASS_1
here.platform.data-processing.compiler.attributeFilters.0.publishAttribute=true
###
# Attribute Filter #2:
here.platform.data-processing.compiler.attributeFilters.1.className=LaneCountAttribute
here.platform.data-processing.compiler.attributeFilters.1.methodName=laneCount
here.platform.data-processing.compiler.attributeFilters.1.attributeCriteria=moreThan
here.platform.data-processing.compiler.attributeFilters.1.attributeValue=2
here.platform.data-processing.compiler.attributeFilters.1.publishAttribute=true

The output of this pipeline will be topology segments that have the FUNCTIONAL_CLASS_1 attribute and have two or more lanes.

Structure

Application Flow Diagram
Figure 1. Application Flow Diagram

Legend: Legend Diagram

Prerequisites

  • If you wish to provide an existing catalog for output data, make sure it is created using the same configuration as specified in the config/output-catalog.json file and is shared with the same GROUP that will be used for deployment of this pipeline template.
  • Confirm that your local credentials (~/.here/credentials.properties) are added to the same GROUP.

Execution

In order to deploy and run this pipeline template you will need the Wizard Deployer. The Wizard executes interactively, asking questions about the application, and expects the user to provide needed answers. Follow the Wizard's documentation instructions, set up the filters as described in FILTERS.md, and then follow these steps:

  1. Execute the script as ./wizard.sh
  2. Follow the prompts and provide needed answers

PLEASE NOTE: During deployment with the Wizard, you will be asked to provide a number of workers (cores) you wish to use for processing. This number depends on the size of the area (bounding box) you wish to process and the layers that will be used as an input. Some layers (for example, adas-attributes) contain more data than other layers, thus requiring more processing power to parse the content. We recommend starting with a small bounding box to assess the processing time required to extract data according to provided filters, and then tuning the amount of workers for your bounding box until you are satisfied with the processing time. When tuning the performance of your pipeline, Spark Web UI is a great tool to monitor resource allocation and data distribution.

Rerunning pipeline with different configuration

Once deployed, this pipeline can be re-executed with a different configuration by simply copying the pipeline version and providing different runtime parameters and amount of workers. The user doesn't have to go through the deployment process via the Wizard again. The same pipeline can be reused to extract different attributes for a different bounding box by simply changing the corresponding runtime parameters during version creation/copying. Make sure to tune the amount of workers depending on the size of the requested bounding box.

Tuning Data Processing Library parameters

This pipeline template uses the Data Processing Library under the hood, which allows the user to tune DPL related parameters in order to achieve better performance. Read more on DPL configuration here. The performance tuning section of the DPL documentation can also be useful when tuning Spark related configuration. All these parameters can be provided in the config/filter.properties file before initial deployment or later as runtime parameters when creating a new version for a deployed pipeline.

Verification

In the Platform Portal select the Pipelines tab where you should be able to see your Pipeline deployed and running. The Spark Web UI provides important details about your running pipeline. After your pipeline finishes and your data is published, you will be able to see your data published in the output catalog.

Cost Estimation

Executing this pipeline template will incur the following costs:

Storage-Blob

Cost will depend on the amount of test data being stored in a Versioned layer.

Metadata

Cost will depend on the amount and size of partitions(metadata) stored in the Versioned layer.

Data Transfer IO

Cost will depend on the amount of:

  • input data read from HMC catalog
  • amount of data published to an output catalog
Compute Core and Compute RAM

Cost will depend on the amount of workers selected by the user

Log Search IO

Cost will depend on the log level set for the execution of this pipeline template. To minimize this cost, the user can set log level to WARN.

Support

If you need support with this pipeline template, please contact us.

results matching ""

    No results matching ""