HERE Map Content (HMC) Topology Filter is a batch pipeline designed to easily extract topology segments data and their related attributes from the HERE Map Content catalog based on the attribute filters provided by the user. One example would be to find all topology segments that belong to the roads of specific functional class, and/or have a certain number of lanes. Please see the FILTERS.md file for detailed information on supported attribute filters. Before deployment of this pipeline template, the user will need to provide filter criteria in the filter.properties file in config folder. One or multiple filters can be provided at a time. If two or more filters are provided, they will be treated using AND logic.
Consider the following example, where the user provides the following filters:
# Attribute Filter #1: here.platform.data-processing.compiler.attributeFilters.0.className=FunctionalClassAttribute here.platform.data-processing.compiler.attributeFilters.0.methodName=functionalClass here.platform.data-processing.compiler.attributeFilters.0.attributeCriteria=equals here.platform.data-processing.compiler.attributeFilters.0.attributeValue=FUNCTIONAL_CLASS_1 here.platform.data-processing.compiler.attributeFilters.0.publishAttribute=true ### # Attribute Filter #2: here.platform.data-processing.compiler.attributeFilters.1.className=LaneCountAttribute here.platform.data-processing.compiler.attributeFilters.1.methodName=laneCount here.platform.data-processing.compiler.attributeFilters.1.attributeCriteria=moreThan here.platform.data-processing.compiler.attributeFilters.1.attributeValue=2 here.platform.data-processing.compiler.attributeFilters.1.publishAttribute=true
The output of this pipeline will be topology segments that have the FUNCTIONAL_CLASS_1 attribute and have two or more lanes.
In order to deploy and run this pipeline template you will need the Wizard Deployer. The Wizard executes interactively, asking questions about the application, and expects the user to provide needed answers. Follow the Wizard's documentation instructions, set up the filters as described in FILTERS.md, and then follow these steps:
PLEASE NOTE: During deployment with the Wizard, you will be asked to provide a number of workers (cores) you wish to use for processing. This number depends on the size of the area (bounding box) you wish to process and the layers that will be used as an input. Some layers (for example, adas-attributes) contain more data than other layers, thus requiring more processing power to parse the content. We recommend starting with a small bounding box to assess the processing time required to extract data according to provided filters, and then tuning the amount of workers for your bounding box until you are satisfied with the processing time. When tuning the performance of your pipeline, Spark Web UI is a great tool to monitor resource allocation and data distribution.
Once deployed, this pipeline can be re-executed with a different configuration by simply copying the pipeline version and providing different runtime parameters and amount of workers. The user doesn't have to go through the deployment process via the Wizard again. The same pipeline can be reused to extract different attributes for a different bounding box by simply changing the corresponding runtime parameters during version creation/copying. Make sure to tune the amount of workers depending on the size of the requested bounding box.
This pipeline template uses the Data Processing Library under the hood, which allows the user to tune DPL related parameters in order to achieve better performance. Read more on DPL configuration here. The performance tuning section of the DPL documentation can also be useful when tuning Spark related configuration. All these parameters can be provided in the config/filter.properties file before initial deployment or later as runtime parameters when creating a new version for a deployed pipeline.
In the Platform Portal select the Pipelines tab where you should be able to see your Pipeline deployed and running. The Spark Web UI provides important details about your running pipeline. After your pipeline finishes and your data is published, you will be able to see your data published in the output catalog.
Executing this pipeline template will incur the following costs:
Cost will depend on the amount of test data being stored in a Versioned layer.
Cost will depend on the amount and size of partitions(metadata) stored in the Versioned layer.
Cost will depend on the amount of:
Cost will depend on the amount of workers selected by the user
Cost will depend on the log level set for the execution of this pipeline template. To minimize this cost, the user can set log level to WARN.
If you need support with this pipeline template, please contact us.