Map Matcher

Overview

The Map Matcher pipeline template reads SDII data from either Versioned or Index layer. It then aggregates messages and sorts by timestamp, and then map matches per message. The results of map matching is written to an Index layer as parquet format. The output index layer has time (1 hour) and HERETile (zoom=12) indices. If the path-based map matcher is used, additional routing information is also stored to a different layer as parquet format with the same indexing of time and HERE Tile.

Structure

Architectural Diagram

Application Flow Diagram
Figure 1. Application Flow Diagram

Legend: Legend Diagram

Prerequisites

  • This pipeline template writes output to a Versioned layer of the catalog. You can use your existing output layer or let the Wizard script create a new catalog/layer for you. Please refer to the section "Execution" below for further details.
  • If you are planning to use an existing catalog/layer please make sure that your output catalog is shared with GROUP which you are going to use for deployment of this pipeline template.
  • Confirm that your local credentials (~/.here/credentials.properties) are added to the same group.

Execution

In order to deploy and run this pipeline template, you will need the Wizard Deployer. The Wizard executes interactively, asking questions about the application, and expects the user to provide needed answers. Assuming you followed the Wizard's documentation instructions and set up the needed parameters beforehand, follow these steps:

  1. Execute the script as ./wizard.sh
  2. Follow the prompts and provide the needed answers

You can use your existing output layer or let the Wizard script create a new catalog/layer for you. If using existing catalog, make sure it is shared with GROUP_ID which will be used for this deployment.

PLEASE NOTE:

  1. During deployment with Wizard script you will be asked to provide a bounding box of the area you wish to process by supplying four coordinates.
  2. Depending on the layer type of the input layer, leave the questions to default with respect to other layers.
  3. Select all default answers when running the wizard to run the pipeline for simulated data sample input catalog.

Output Catalog

If you do not use the Wizard to create the output catalog then refer to the file config/output-catalog.json to find the structure of the output layers needed in the output catalog to publish results of the pipeline.

The output catalog of the pipeline consists of 2 layers:

    - layer_1: 
            - stores map-matched information as parquet format 
            - index layer 
            - timewindow (1 hour), HERE Tiles (zoom level 12) 
     - layer_2 
            - stores routing information as parquet format 
            - index layer 
            - timewindow (1 hour), HERE Tiles (zoom level 12) 

Verification

In Platform Portal select the Pipelines tab where you should be able to see your Pipeline deployed and running. After your Pipeline finishes and your data is published, you can find your output catalog under the Data tab and inspect your data visually or query/retrieve your data programmatically using one of the following options:

Cost Estimation

Executing this pipeline template will incur the following costs:

Storage-Blob

Cost will depend on the amount of data that will be published to a Versioned layer as an output from execution.

Data Transfer IO

Will depend on:

  • Amount of data read from input catalog and HMC catalog - this will depend on the bounding box that you specified
  • Amount of data published to your output layer
Metadata

Will depend on the amount and size of partitions (metadata) stored in the Versioned layer.

Compute Core and Compute RAM

Depends on the number of workers you specify in the Wizard questions

Log Search IO

To minimize this cost log level is set to WARN.

Support

If you need support with this pipeline template, please contact us.

results matching ""

    No results matching ""