# Map Matcher

## Overview

The Map Matcher pipeline template can be used to perform two types of map matching on probe data:

1. Path Matcher (carPathMatcherWithTransitions)
2. Point Matcher (Proximity Search) with configurable search radius

It supports map-matching on the following two input data formats:

1. Sensor Data Ingestion Interface (SDII) data stored in either Versioned or Index layer.
2. Generic/CSV data stored as parquet format in an Index layer.

The Map Matcher publishes results to an Index layer in parquet format. The output Index layer has indices of timewindow of 1 hour and HERETile zoom level of 12. If Path Matcher is used, additional routing information such as messageId, route, hourKey, idx_time and idx_tileId publishes to a different layer in parquet format with the same indices.

## Structure

Architectural Diagram

Legend:

## Prerequisites

• This pipeline template writes output to an indexed layer of the catalog. You can use your existing output layer or let the Wizard script create a new catalog/layer for you. Please refer to the section "Execution" below for further details.
• If you are planning to use an existing catalog/layer please make sure that your output catalog is shared with GROUP which you are going to use for deployment of this pipeline template.
• Confirm that your local credentials (~/.here/credentials.properties) are added to the same group.
• When the input data format is generic parquet format then the index layer must contain the following required columns:
• For Path Matching: messageId,timestamp,latitude,longitude,messageTimestamp,messageTileId
• For Point Matching: latitude,longitude
• When the input data must be read from an index layer:
• It must be indexed by timewindow and HERETile.
• The start time and end time must be exactly within the time window range of the input data.
• If you want to process the entire input data, the bounding box must be set as the minimum and maximum range of the latitude and longitude values like below -
resultsBoundingBox.southLatitude=-90
resultsBoundingBox.northLatitude=90
resultsBoundingBox.westLongitude=-180
resultsBoundingBox.eastLongitude=180


## Execution

In order to deploy and run this pipeline template, you will need the Wizard Deployer. The Wizard executes interactively, asking questions about the application, and expects the user to provide needed answers. Assuming you followed the Wizard's documentation instructions and set up the needed parameters beforehand, follow these steps:

1. Execute the script as ./wizard.sh

You can use your existing output layer or let the Wizard script create a new catalog/layer for you. If using existing catalog, make sure it is shared with GROUP_ID which will be used for this deployment.

1. During deployment with Wizard script you will be asked to provide a bounding box of the area you wish to process by supplying four coordinates.
2. Depending on the layer type of the input layer, leave the questions to default with respect to other layers.
3. Select all default answers when running the wizard to run the pipeline for simulated data sample input catalog.

#### Output Catalog

If you do not use the Wizard to create the output catalog then refer to the file config/output-catalog.json to find the structure of the output layers needed in the output catalog to publish results of the pipeline.

The output catalog of the pipeline consists of 2 layers:

    - layer_1:
- stores map-matched information as parquet format
- index layer
- timewindow (1 hour), HERE Tiles (zoom level 12)
- layer_2
- stores routing information as parquet format
- index layer
- timewindow (1 hour), HERE Tiles (zoom level 12)

##### Output format

For input data type SDII Data stored in a Versioned or Index layer, the output Layer 1 contains the map matched information with the following columns:

messageId timestamp speed rawLat rawLon mmLat mmLon fraction distInMeters vertexTileId vertexIndex hmcTileId segmentId segmentDir hourKey idx_time idx_tileId

For input data type Generic/CSV stored as parquet format in an Index layer, the output Layer 1 contains the map matched information with the following new columns added to the existing input data:

mmLat mmLon fraction distInMeters vertexTileId vertexIndex hmcTileId segmentId segmentDir

When path matching is performed on any supported input data format, the output Layer 2 contains the routing information with the following columns:

messageId route hourKey idx_time idx_tileId

NOTE:

1. The route column will contain an empty sequence, when a route is not found for a message while map matching.

2. The values in the columns mmLat, mmLon, fraction, distInMeters, vertexTileId, vertexIndex, hmcTileId, segmentId are set to -999 and segmentDir is set to NA, when the map matcher does not return any results for a probe while map matching.

## Verification

In Platform Portal select the Pipelines tab where you should be able to see your Pipeline deployed and running. After your Pipeline finishes and your data is published, you can find your output catalog under the Data tab and inspect your data visually or query/retrieve your data programmatically using one of the following options:

## Cost Estimation

Executing this pipeline template will incur the following costs:

##### Storage-Blob

Cost will depend on the amount of data that will be published to a Versioned layer as an output from execution.

##### Data Transfer IO

Will depend on:

• Amount of data read from input catalog and HMC catalog - this will depend on the bounding box that you specified
• Amount of data published to your output layer

Will depend on the amount and size of partitions (metadata) stored in the Versioned layer.

##### Compute Core and Compute RAM

Depends on the number of workers you specify in the Wizard questions

##### Log Search IO

To minimize this cost log level is set to WARN.