Use Cases

There are three main ways for completing pipeline tasks:

  1. HERE platform portal
  2. Command Line Interface (CLI)
  3. Pipeline API

For further details on the Pipeline API, see the API Reference.

Using any of these approaches, you can accomplish the following use cases:

Create Catalogs

The Real-Time Anonymizer requires two streaming layers for the input and output data:

  1. Input streaming layer - This layer will be used as a queue of raw, non-anonymized, real-time location data.
  2. Output streaming layer - This layer will be used as a queue of anonymized, output, real-time location data.

Normally, these streaming layers should be in separate catalogs.

HERE recommends that separate catalogs are used for raw, input data to Real-Time Anonymizer and anonymized, output data. Separating the input and output data in different catalogs, reduces the likelihood of raw, input data being shared externally (on HERE Marketplace).

Steps

  1. Create two HERE platform catalogs (recommended approach) - one for input and one for output data.
  2. Share catalogs - with the necessary HERE platform group and giving read, write, and manage permission.
  3. In each catalog (both input and output), create a new streaming layer with the following configuration options:

  4. Layer Type = Stream

  5. Content Type = application/x-protobuf
  6. Schema = Sensoris Specification 1.0.0 or SDII v3 Schema, Message (supported version is 4.1.0)

Create Pipelines

The process for preparing a new pipeline version for a new pipeline involves the following steps:

  • Create a pipeline
  • Create a template
  • Create a pipeline version

Depending upon the tools used for this, these steps either come separately or as a single step. In the portal, you can create a pipeline, pipeline template, and first pipeline version, all in one single workflow.

In preparation for preparing a pipeline, download the Real-Time Anonymizer template zip archive and extract all of its contents. The zip archive contains the following:

  • .JAR file (used to create the pipeline template)
  • README.md file (providing an overview of the pipeline template)
  • config (configuration) folder - contains pipeline-config.conf, which details runtime parameters for the pipeline

Create New Pipeline

A new pipeline is created with the following properties:

  • Pipeline in a project - This pipeline can be created within or outside of a project. A project is an access-controlled collection of resources (catalogs, pipelines and schemas).
  • Shared with a group - The HERE platform group from which members are able to access the pipeline. In order for members to be able to run this pipeline, they will additionally need access to the input and output catalogs.
  • Pipeline name and description - details for this specific pipeline, for which pipeline versions can be created.
  • Notification Email - Used to distribute information on outages and service incidents.

Create New Template

A new pipeline template is created with the following properties:

  • Pipeline template name - name for the pipeline template created.
  • Runtime environment - stream 2.0.0 or 3.0.0
  • Pipeline template - new template created by uploading a JAR file (provided in zip archive) or by using existing pipeline template (previously created with the template JAR file).
  • Pipeline template group - the HERE platform group from which members are able to access the pipeline template.
  • Multi-region support - secondary region available in case of primary region fails.
  • Entry point class name - com.here.platform.extensions.anonymization.stream.AnonymizationStreamingApp
  • Input and output catalogs - the source and output catalogs for the location data streams.

Create New Version

A new pipeline version is created with the following properties:

  • Version name - unique name of this specific pipeline Version.
  • Pipeline template - specific pipeline template to be used.
  • Pipeline ID - specific pipeline for which version will be created.
  • Input and output catalogs - the catalogs specified here will override the catalogs defined for the pipeline template.
  • Cluster configuration - Flink job manager and task manager size must be configured.
  • Runtime parameters - specific configuration to be used for this Real-Time Anonymizer version including streaming layers and anonymization method. For more details, see Configure Real-Time Anonymizer Version.
  • Cost allocation tag - used for allocating costs of this pipeline version.

Configure Pipelines

A new Real-Time Anonymizer version requires configurations for the following:

  • Streaming layers in input and output catalogs (when using Wizard Deployer, catalog HRN and layer ID are required for access).
  • Use case information including one or more use cases of location data, data format and minimum use case data requirements.
  • Anonymization strategy including anonymization methods and parameters for this method.

This configuration information is provided as runtime parameters when starting the pipeline. Depending on whether the Wizard Deployer or the HERE platform portal and CLI is used for setting up and starting this pipeline, you need to provide the runtime parameters as either a .config file or list of key-value pairs. A anonymization-pipeline.config template file is available in the Pipeline Template zip archive.

Data Configuration

The data configuration allows the input and output layers to be specified (when using the HERE platform portal and CLI only).

Note: No Data Configuration for Wizard Deployer

When using the Wizard Deployer, the input and output layers should not be included in the anonymization-pipeline.config file. These details are requested by the Wizard.

Property Name Property Requirement Description
Input layer ID Required Raw input data streaming layer ID (layer must exist in the specified input catalog)
Output layer ID Required Anonymized output data streaming layer ID (layer must exist in the specified output catalog)

This is the format for the input and output data configuration properties:

#Raw input data streaming layer (layer must exist in specified input catalog) [Required]
pipeline.input.layer.id=
#Anonymized output data streaming layer (layer must exist in specified output catalog) [Required]
pipeline.output.layer.id=

Use Case Configuration

The use case configuration allows the anonymization use case, data (format and type) and minimum data requirements to be specified for the Real-Time Anonymizer.

Property Name Property Requirement Description
Use case type Required Use case type that anonymization is to be applied for. Supported use case type is TrafficInformation.
Data type Required Data type of input and output data for anonymization. Supported data type is NearRealTime
Data format Required Data format of input and output data. Supported data formats are SENSORIS and SDII
Min. input points count Optional Minimum number of points required in input trajectory chunk, for anonymization to be applied. Value must be greater than 2. Default value is "2".
Min. output points count Optional Minimum number of points required in output trajectory chunk, for anonymization to be applied. Value must be greater than 2. Default value is "2".
Data retention time Optional Retention time defines how long information about anonymized sub-trajectory is preserved after anonymized. Default value is 10 mins.

This is the format for the use case configuration properties:

# Use case type that anonymization is to be applied for. Supported use case type is `TrafficInformation` [Required]
pipeline.config.useCase.type=TrafficInformation
# Data type of input and output data for anonymization. Supported data type is `NearRealTime` [Required]
pipeline.config.useCase.dataType=NearRealTime
# Data format of input and output data. Supported data formats are `SENSORIS` and `SDII` [Required]
pipeline.config.useCase.dataFormat=SENSORIS
# Minimum number of points required in input trajectory chunk, for anonymization to be applied. Value must be greater than 2. Default value is "2" [Optional]
pipeline.config.useCase.minInputPointsCount=2
# Minimum number of points required in output trajectory chunk, for anonymization to be applied. Value must be greater
 than 2. Default value is "2" [Optional]
pipeline.config.useCase.minOutputPointsCount=2
# Retention time defines how long information about anonymized sub-trajectory is preserved after anonymized. Default value is 10 mins [Optional]
pipeline.config.useCase.retentionTimeMinutes=10

Anonymization Strategy Configuration

The anonymization strategy configuration allows the anonymization method and parameters for this particular method to be set. In the case of Split and Gap anonymization method this includes start cutting, sub-trajectory length and gap length. The anonymized, output data will be in accordance with the configured strategy.

Anonymization Strategy Value Types

In this configuration, range values are widely set as min, max and units for a single parameter. This approach allows for one of the following:

  • constant values to be set - with the same min and max values defined
  • random value (within set range) to be used - random value chosen within min and max values

Anonymizing data using random values reduces the privacy risk of this anonymized data. This method makes it harder for an attacker as the exact anonymization pattern is not constant.

Anonymization Strategy Parameters

The anonymization strategy configuration allows you to define the anonymization algorithm and parameters for this algorithm.

Note: Anonymization Strategy Values

Carefully choose the anonymization strategy values and review the output data to ensure that you have achieved an acceptable level of anonymization.

The table below shows the anonymization method values.

Anonymization Strategy Property Property Description Property Type Requirement Description
Anonymization type Anonymization algorithm (SplitAndGap) to be applied to the input, raw data. Type Required
Sub-trajectory size Sub-trajectories are sets of positions output from anonymization. Min Required Min. size of anonymized sub-trajectories
Max Required Max. size of anonymized sub-trajectories
Units Required Unit of measurement for subTrajectorySize. Supported unit is "seconds"
Gap Size Gaps are spaces between sub-trajectories wHERE no positions are removed. Min Required Min. size of gaps between anonymized sub-trajectories
Max Required Max. size of gaps between anonymized sub-trajectories
Units Required Unit of measurement for gapSize. Supported unit is "seconds"
Skip first - time skipFirst.time is the removal of positions at the start of a journey considering travel time. Min Optional Min. duration to be removed at the start of the raw trajectory
Max Optional Min. duration to be removed at the start of the raw trajectory
Units Optional Unit of measurement for values "min" and "max" duration. Supported unit is "seconds"
Skip first - speed skipFirst.speed is the removal of positions at the start of a journey considering speed driven. Min Optional Min. speed of positions to be removed at the start of the raw trajectory
Max Optional Max. speed of positions to be removed at the start of the raw trajectory
Units Optional Unit of measurement for values "min" and "max" speed. Supported unit is "km/h"
Skip first - proximity skipFirst.proximity is the removal of positions at the start of a journey considering distance from start point. Min Optional Min. distance to be removed at the start of the raw trajectory
Max Optional Max. distance to be removed at the start of the raw trajectory
Units Optional Unit of measurement for values "min" and "max" proximity. Supported unit is "meters"
Skip until skipUntil condition allows multiple, skipFirst conditions to be used together as a complex skipUntil rule (conditions include: proximity, speed or time), with operators supported including 'and' and 'or'. Optional for single conditions. Example for multiple conditions, skipUntilskipUntil = (time and speed) or proximity Rule Optional
Sampling rate Distance between adjacent points in anonymized output sub-trajectories. Default value is 0 seconds. Min Optional Min. distance between adjacent points in anonymized trajectories
Max Optional Max. distance between adjacent points in anonymized trajectories
Units Optional Unit of measurement for sampling rate "min" and "max" values. Supported unit is "seconds"

Anonymization Strategy Configuration Format

This is a list of the anonymization strategy configuration properties:

# Type of anonymization algorithm [Required]
pipeline.config.anonymization.type=SplitAndGap
# Min size of anonymized trajectories [Required]
pipeline.config.anonymization.subTrajectorySize.min=
# Max size of anonymized trajectories [Required]
pipeline.config.anonymization.subTrajectorySize.max=
# Unit of measurement for "subTrajectorySize". Supported unit is "seconds" [Required]
pipeline.config.anonymization.subTrajectorySize.unit=seconds
# Min size of gaps between anonymized trajectories [Required]
pipeline.config.anonymization.gapSize.min=
# Max size of gaps between anonymized trajectories [Required]
pipeline.config.anonymization.gapSize.max=
# Unit of measurement for values "min" and "max". Supported unit is "seconds" [Required]
pipeline.config.anonymization.gapSize.unit=seconds
# Min duration to be removed at the start of the raw trajectory [Optional]
pipeline.config.anonymization.skipFirst.time.min=
# Max duration to be removed at the start of the raw trajectory [Optional]
pipeline.config.anonymization.skipFirst.time.max=
# Unit of measurement for values "min" and "max" duration. Supported unit is "seconds" [Optional]
pipeline.config.anonymization.skipFirst.time.unit=seconds
# Min speed of positions to be removed at the start of the raw trajectory [Optional]
pipeline.config.anonymization.skipFirst.speed.min=
# Max speed of positions to be removed at the start of the raw trajectory [Optional]
pipeline.config.anonymization.skipFirst.speed.max=
# Unit of measurement for values "min" and "max" speed. Supported unit is "km/h" [Optional]
pipeline.config.anonymization.skipFirst.speed.unit=km/h
# Min distance to be removed at the start of the raw trajectory [Optional]
pipeline.config.anonymization.skipFirst.proximity.min=
# Max distance to be removed at the start of the raw trajectory [Optional]
pipeline.config.anonymization.skipFirst.proximity.max=
# Unit of measurement for values "min" and "max" proximity. Supported unit is "meters" [Optional]
pipeline.config.anonymization.skipFirst.proximity.unit=meters
# 'skipUntil' condition is required when multiple 'skipFirst' conditions are provided (conditions include: proximity, speed or time). Operators supported include 'and' and 'or'. Example 'skipUntil' = `(proximity or speed) and time`. Optional for single conditions. [Optional]
pipeline.config.anonymization.skipFirst.skipUntil=
# Min distance between adjacent points in anonymized trajectories [Optional]
pipeline.config.anonymization.samplingRate.min=
# Max distance between adjacent points in anonymized trajectories [Optional]
pipeline.config.anonymization.samplingRate.max=
# Unit of measurement for sampling rate "min" and "max" values. Supported unit is "seconds" [Optional]
pipeline.config.anonymization.samplingRate.unit=seconds

Example Real-Time Anonymizer Configuration

You can adjust this example Real-Time Anonymizer configuration to your anonymization requirements.

Note: Anonymization Strategy Values

Carefully choose the anonymization strategy values and review the output data to ensure that you have achieved an acceptable level of anonymization. The example below provides an example of the anonymization method values.

# Input/Output Data Parameters

# Raw input data Streaming layer (Layer must be included in specified Input Catalog) [Required]
# ************** Layer ID Must be Added **************
pipeline.input.layer.id=
# Anonymized output data Streaming Layer (Layer must be included in specified Output Catalog) [Required]
# ************** Layer ID Must be Added **************
pipeline.output.layer.id=

# Use Case Parameters:

# Use case type to be used in anonymization algorithm [Required]
pipeline.config.useCase.type=TrafficInformation
# Data type of input and output data for anonymization [Required]
pipeline.config.useCase.dataType=NearRealTime
# Data Format of input and output data. Supported data formats are `SENSORIS` and `SDII` [Required]
pipeline.config.useCase.dataFormat=SENSORIS
# Minimum number of points required in input trajectory chunk, for anonymization to be applied. Value must be greater than 2. Default value is "2" [Optional]
pipeline.config.useCase.minInputPointsCount=2
# Minimum number of points required in input trajectory chunk, for anonymization to be applied. Value must be greater than 2. Default value is "2" [Optional]
pipeline.config.useCase.minOutputPointsCount=2
# Retention time defines how long information about trajectory is preserved after trajectory's chunk is anonymized. Default value is 10mins [Optional]
pipeline.config.useCase.retentionTimeMinutes=10

# Anonymization Strategy Parameters:

# Type of anonymization algorithm [Required]
pipeline.config.anonymization.type=SplitAndGap
# Min size of anonymized trajectories [Required]
pipeline.config.anonymization.subTrajectorySize.min=120
# Max size of anonymized trajectories [Required]
pipeline.config.anonymization.subTrajectorySize.max=120
# Unit of measurement for "subTrajectorySize". Supported unit is "seconds" [Required]
pipeline.config.anonymization.subTrajectorySize.unit=seconds
# Min size of gaps between anonymized trajectories [Required]
pipeline.config.anonymization.gapSize.min=40
# Max size of gaps between anonymized trajectories [Required]
pipeline.config.anonymization.gapSize.max=80
# Unit of measurement for values "min" and "max". Supported unit is "seconds" [Required]
pipeline.config.anonymization.gapSize.unit=seconds
# Min amount of data to be removed at the start of the raw trajectory [Optional]
pipeline.config.anonymization.skipFirst.time.min=60
# Max amount of data to be removed at the start of the raw trajectory [Optional]
pipeline.config.anonymization.skipFirst.time.max=70
# Unit of measurement for values "min" and "max". Supported unit is "seconds" [Optional]
pipeline.config.anonymization.skipFirst.time.unit=seconds
# All data with speed value missing or less than configured value will be removed [Optional]
pipeline.config.anonymization.skipFirst.speed.min=10
# At the start of raw trajectory, all data with speed value missing or less than configured value will be removed [Optional]
pipeline.config.anonymization.skipFirst.speed.max=12
# Unit of measurement for values "min" and "max" speed. Supported unit is "km/h" [Optional]
pipeline.config.anonymization.skipFirst.speed.unit=km/h
# At the start of raw trajectory, all data with speed value missing or less than configured value will be removed [Optional]
pipeline.config.anonymization.skipFirst.proximity.min=20
# At the start of raw trajectory, all data with speed value missing or less than configured value will be removed [Optional]
pipeline.config.anonymization.skipFirst.proximity.max=40
# Unit of measurement for values "min" and "max". Supported unit is "meters" [Optional]
pipeline.config.anonymization.skipFirst.proximity.unit=meters
# 'skipUntil' condition is required when multiple 'skipFirst' conditions are provided (conditions include: proximity, speed or time). Operators supported include 'and' and 'or'. Optional for single conditions. [Optional]
pipeline.config.anonymization.skipFirst.skipUntil=(time and speed) or proximity
#Min distance between adjacent points in anonymized trajectories [Optional]
pipeline.config.anonymization.samplingRate.min=2
# Max distance between adjacent points in anonymized trajectories [Optional]
pipeline.config.anonymization.samplingRate.max=5
# Unit of measurement for values "min" and "max". Supported unit is "seconds" [Optional]
pipeline.config.anonymization.samplingRate.unit=seconds

Change Configuration

For the Real-Time Anonymizer, your current configuration might need to be changed for the following reasons:

  • data input or output data layers have changed
  • use case or data formats needing to be handled have changed
  • anonymization strategy needs to change to achieve expected user anonymity or data utility

To update some parts of the pipeline version configuration, the simplest way is to copy a pipeline version with updating only required fields.

The relevant configuration should be changed, as detailed in Configure Real-Time Anonymizer Version.

Start Anonymization

In order to start anonymizing location data, create and start a Real-Time Anonymizer version. Before starting a new pipeline version, stop all running pipeline versions (for the same Real-Time Anonymizer version). You can easily view the pipeline versions created for the Real-Time Anonymizer in the portal.

Note: Single Running Pipeline Version per Pipeline

You can create multiple versions of a single Real-Time Anonymizer with different input data and configurations. However, there can only be one pipeline version running for a single pipeline. Therefore, in order to start running a new pipeline version, first stop any other running pipeline versions.

The HERE platform portal is one option for activating the Real-Time Anonymizer version. For the required steps, see Pipelines Developer Guide . When activating a pipeline version, you can specify run-time credentials for running this pipeline version. These may be your user credentials or user-generated credentials (apps). For more information, see the Identity & Access Management Guide.

After activating a pipeline version, the pipeline status will switch to Run Pending. When your pipeline has started successfully, the state will change to Operation: Run Succeeded. At this point, your pipeline is now running and any data added (from this point forward only) will start to be processed by Real-Time Anonymizer. A running pipeline version will have new operational choices including the option to Pause the running pipeline version or to Cancel.

If there is a problem when activating a pipeline version and a pipeline version fails to run, a red alert message appears on the portal.

Stop Anonymization

A running pipeline version can be Paused or Cancelled. Cancelling a running pipeline version normally requires submitting the pipeline ID and pipeline version ID along with the Cancel request. Cancelling a pipeline version immediately interrupts and cancel the relevant job with no chance of restarting. After starting a cancellation operation, a Operation: Cancel Pending message appears until cancellation is completed. When cancellation is complete, this pipeline version is then inactive and returns to its Ready state.

results matching ""

    No results matching ""