Configuring Real-Time Anonymization Pipeline Version
A new Real-Time Anonymization Pipeline Version requires configurations for:
- Streaming Layers in input / output Catalogs (not provided when using Wizard Deployer).
- Use Case information including use case/s of location data, data format and minimum use case data requirements.
- Anonymization Strategy including anonymization methods and parameters for this method.
This configuration information is provided as runtime parameters when starting the Pipeline. Depending on whether the Wizard Deployer or CLI / HERE Platform Web Portal is used for setting up and starting this pipeline, the runtime parameters will need to be provided as either a .config file or list of key-value pairs. A template anonymization-pipeline.config file is available in the Pipeline Template zip archive.
Data Configuration
The Data Configuration allows the input and output layers to be specified (when using HERE Platform WebPortal and CLI only).
Note: No Data Configuration for Wizard Deployer
When using the Wizard Deployer, the inout and output Layers should not be included in the anonymization-pipeline.config
file. These details are requested by the Wizard.
Property Name | Property Requirement | Description |
Input Layer ID | Required | Raw input data Streaming layer ID (Layer must exist in specified Input Catalog) |
Output Layer ID | Required | Anonymized output data Streaming Layer ID (Layer must exist in specified Output Catalog) |
Here is the format for the input / output data configuration properties:
#Raw input data Streaming layer (Layer must exist in specified Input Catalog) [Required]
pipeline.input.layer.id=
#Anonymized output data Streaming Layer (Layer must exist in specified Output Catalog) [Required]
pipeline.output.layer.id=
Use Case Configuration
The Use Case Configuration allows the anonymization use case, data (format and type) and minimum data requirements to be specified for the Anonymization Pipeline.
Property Name | Property Requirement | Description |
Use Case Type | Required | Use case type that anonymization is to be applied for. Supported use case type is TrafficInformation . |
Data Type | Required | Data Type of input and output data for anonymization. Supported data type is NearRealTime |
Data Format | Required | Data Format of input and output data. Supported data format is SENSORIS |
Min Input Points Count | Optional | Minimum number of points required in input trajectory chunk, for anonymization to be applied. Value must be greater than 2. Default value is "2". |
Min Output Points Count | Optional | Minimum number of points required in output trajectory chunk, for anonymization to be applied. Value must be greater than 2. Default value is "2". |
Data Retention Time | Optional | Retention time defines how long information about anonymized sub-trajectory is preserved after anonymized. Default value is 10 mins. |
Here is the format for use case configuration properties:
# Use case type that anonymization is to be applied for. Supported use case type is `TrafficInformation` [Required]
pipeline.config.useCase.type=TrafficInformation
# Data Type of input and output data for anonymization. Supported data type is `NearRealTime` [Required]
pipeline.config.useCase.dataType=NearRealTime
# Data Format of input and output data. Supported data format is `SENSORIS` [Required]
pipeline.config.useCase.dataFormat=SENSORIS
# Minimum number of points required in input trajectory chunk, for anonymization to be applied. Value must be greater than 2. Default value is "2" [Optional]
pipeline.config.useCase.minInputPointsCount=2
# Minimum number of points required in output trajectory chunk, for anonymization to be applied. Value must be greater
than 2. Default value is "2" [Optional]
pipeline.config.useCase.minOutputPointsCount=2
# Retention time defines how long information about anonymized sub-trajectory is preserved after anonymized. Default value is 10 mins [Optional]
pipeline.config.useCase.retentionTimeMinutes=10
Anonymization Strategy Configuration
Anonymization Strategy Value Types
In this configuration, range values are widely set (i.e. min, max and units) for a single parameter. This approach allows either:
- constant value to be set - with same min and max value defined.
- random value (within set range) to be used - random value chosen within min and max values.
Anonymizing data using random values reduces the privacy risk of this anonymized data. The rationale being that it is harder for an attacker, as the exact anonymization pattern is not constant.
Anonymization Strategy Parameters
The Anonymization Strategy Configuration allows the anonymization algorithm and parameters for this algorithm to be defined.
Note: Anonymization Strategy Values
Carefully choose the anonymization strategy values and review the output data to ensure that you have achieved an acceptable level of anonymization. The example below provides an example of the anonymization method values.
Anonymization Strategy Property | Property Description | Property Type | Requirement | Description |
Anonymization Type | Anonymization algorithm (i.e. SplitAndGap ) to be applied to the input, raw data. | Type | Required | |
Sub Trajectory Size | Sub-Trajectories are sets of positions output from anonymization. | Min | Required | Min size of anonymized sub-trajectories |
Max | Required | Max size of anonymized sub-trajectories |
Units | Required | Unit of measurement for subTrajectorySize . Supported unit is "seconds" |
Gap Size | Gaps are spaces between sub-trajectories where no positions are removed. | Min | Required | Min size of gaps between anonymized sub-trajectories |
Max | Required | Max size of gaps between anonymized sub-trajectories |
Units | Required | Unit of measurement for gapSize . Supported unit is "seconds" |
Skip First - Time | skipFirst.time is the removal of positions at the start of a journey considering travel time. | Min | Optional | Min duration to be removed at the start of the raw trajectory |
Max | Optional | Min duration to be removed at the start of the raw trajectory |
Units | Optional | Unit of measurement for values "min" and "max" duration. Supported unit is "seconds" |
Skip First - Speed | skipFirst.speed is the removal of positions at the start of a journey considering speed driven. | Min | Optional | Min speed of positions to be removed at the start of the raw trajectory |
Max | Optional | Max speed of positions to be removed at the start of the raw trajectory |
Units | Optional | Unit of measurement for values "min" and "max" speed. Supported unit is "km/h" |
Skip First - Proximity | skipFirst.proximity is the removal of positions at the start of a journey considering distance from start point. | Min | Optional | Min distance to be removed at the start of the raw trajectory |
Max | Optional | Max distance to be removed at the start of the raw trajectory |
Units | Optional | Unit of measurement for values "min" and "max" proximity. Supported unit is "meters" |
Skip Until | skipUntil condition allows multiple, skipFirst conditions to be used together as a complex skipUntil rule (conditions include: proximity, speed or time), with operators supported including 'and' and 'or'. Optional for single conditions. Example for multiple conditions, skipUntil skipUntil = (time and speed) or proximity | Rule | Optional | |
Sampling Rate | Distance between adjacent points in anonymized output sub-trajectories. Default value is 0 seconds. | Min | Optional | Min distance between adjacent points in anonymized trajectories |
Max | Optional | Max distance between adjacent points in anonymized trajectories |
Units | Optional | Unit of measurement for sampling rate "min" and "max" values. Supported unit is "seconds" |
Here is the anonymization strategy configuration properties:
# Type of anonymization algorithm [Required]
pipeline.config.anonymization.type=SplitAndGap
# Min size of anonymized trajectories [Required]
pipeline.config.anonymization.subTrajectorySize.min=
# Max size of anonymized trajectories [Required]
pipeline.config.anonymization.subTrajectorySize.max=
# Unit of measurement for "subTrajectorySize". Supported unit is "seconds" [Required]
pipeline.config.anonymization.subTrajectorySize.unit=seconds
# Min size of gaps between anonymized trajectories [Required]
pipeline.config.anonymization.gapSize.min=
# Max size of gaps between anonymized trajectories [Required]
pipeline.config.anonymization.gapSize.max=
# Unit of measurement for values "min" and "max". Supported unit is "seconds" [Required]
pipeline.config.anonymization.gapSize.unit=seconds
# Min duration to be removed at the start of the raw trajectory [Optional]
pipeline.config.anonymization.skipFirst.time.min=
# Max duration to be removed at the start of the raw trajectory [Optional]
pipeline.config.anonymization.skipFirst.time.max=
# Unit of measurement for values "min" and "max" duration. Supported unit is "seconds" [Optional]
pipeline.config.anonymization.skipFirst.time.unit=seconds
# Min speed of positions to be removed at the start of the raw trajectory [Optional]
pipeline.config.anonymization.skipFirst.speed.min=
# Max speed of positions to be removed at the start of the raw trajectory [Optional]
pipeline.config.anonymization.skipFirst.speed.max=
# Unit of measurement for values "min" and "max" speed. Supported unit is "km/h" [Optional]
pipeline.config.anonymization.skipFirst.speed.unit=km/h
# Min distance to be removed at the start of the raw trajectory [Optional]
pipeline.config.anonymization.skipFirst.proximity.min=
# Max distance to be removed at the start of the raw trajectory [Optional]
pipeline.config.anonymization.skipFirst.proximity.max=
# Unit of measurement for values "min" and "max" proximity. Supported unit is "meters" [Optional]
pipeline.config.anonymization.skipFirst.proximity.unit=meters
# 'skipUntil' condition is required when multiple 'skipFirst' conditions are provided (conditions include: proximity, speed or time). Operators supported include 'and' and 'or'. Example 'skipUntil' = `(proximity or speed) and time`. Optional for single conditions. [Optional]
pipeline.config.anonymization.skipFirst.skipUntil=
# Min distance between adjacent points in anonymized trajectories [Optional]
pipeline.config.anonymization.samplingRate.min=
# Max distance between adjacent points in anonymized trajectories [Optional]
pipeline.config.anonymization.samplingRate.max=
# Unit of measurement for sampling rate "min" and "max" values. Supported unit is "seconds" [Optional]
pipeline.config.anonymization.samplingRate.unit=seconds
Example Real-Time Anonymization Pipeline Configuration
An example Real-Time Anonymization Pipeline Configuration is provided for adaptation to your anonymization requirements.
Note: Anonymization Strategy Values
Carefully choose the anonymization strategy values and review the output data to ensure that you have achieved an acceptable level of anonymization. The example below provides an example of the anonymization method values.
# Input / Output Data Parameters
# Raw input data Streaming layer (Layer must be included in specified Input Catalog) [Required]
# ************** Layer ID Must be Added **************
pipeline.input.layer.id=
# Anonymized output data Streaming Layer (Layer must be included in specified Output Catalog) [Required]
# ************** Layer ID Must be Added **************
pipeline.output.layer.id=
# Use Case Parameters:
# Use case type to be used in anonymization algorithm [Required]
pipeline.config.useCase.type=TrafficInformation
# Data Type of input and output data for anonymization [Required]
pipeline.config.useCase.dataType=NearRealTime
# Data Format of input and output data [Required]
pipeline.config.useCase.dataFormat=SENSORIS
# Minimum number of points required in input trajectory chunk, for anonymization to be applied. Value must be greater than 2. Default value is "2" [Optional]
pipeline.config.useCase.minInputPointsCount=2
# Minimum number of points required in input trajectory chunk, for anonymization to be applied. Value must be greater than 2. Default value is "2" [Optional]
pipeline.config.useCase.minOutputPointsCount=2
# Retention time defines how long information about trajectory is preserved after trajectory's chunk is anonymized. Default value is 10mins [Optional]
pipeline.config.useCase.retentionTimeMinutes=10
# Anonymization Strategy Parameters:
# Type of anonymization algorithm [Required]
pipeline.config.anonymization.type=SplitAndGap
# Min size of anonymized trajectories [Required]
pipeline.config.anonymization.subTrajectorySize.min=120
# Max size of anonymized trajectories [Required]
pipeline.config.anonymization.subTrajectorySize.max=120
# Unit of measurement for "subTrajectorySize". Supported unit is "seconds" [Required]
pipeline.config.anonymization.subTrajectorySize.unit=seconds
# Min size of gaps between anonymized trajectories [Required]
pipeline.config.anonymization.gapSize.min=40
# Max size of gaps between anonymized trajectories [Required]
pipeline.config.anonymization.gapSize.max=80
# Unit of measurement for values "min" and "max". Supported unit is "seconds" [Required]
pipeline.config.anonymization.gapSize.unit=seconds
# Min amount of data to be removed at the start of the raw trajectory [Optional]
pipeline.config.anonymization.skipFirst.time.min=60
# Max amount of data to be removed at the start of the raw trajectory [Optional]
pipeline.config.anonymization.skipFirst.time.max=70
# Unit of measurement for values "min" and "max". Supported unit is "seconds" [Optional]
pipeline.config.anonymization.skipFirst.time.unit=seconds
# All data with speed value missing or less than configured value will be removed [Optional]
pipeline.config.anonymization.skipFirst.speed.min=10
# At the start of raw trajectory, all data with speed value missing or less than configured value will be removed [Optional]
pipeline.config.anonymization.skipFirst.speed.max=12
# Unit of measurement for values "min" and "max" speed. Supported unit is "km/h" [Optional]
pipeline.config.anonymization.skipFirst.speed.unit=km/h
# At the start of raw trajectory, all data with speed value missing or less than configured value will be removed [Optional]
pipeline.config.anonymization.skipFirst.proximity.min=20
# At the start of raw trajectory, all data with speed value missing or less than configured value will be removed [Optional]
pipeline.config.anonymization.skipFirst.proximity.max=40
# Unit of measurement for values "min" and "max". Supported unit is "meters" [Optional]
pipeline.config.anonymization.skipFirst.proximity.unit=meters
# 'skipUntil' condition is required when multiple 'skipFirst' conditions are provided (conditions include: proximity, speed or time). Operators supported include 'and' and 'or'. Optional for single conditions. [Optional]
pipeline.config.anonymization.skipFirst.skipUntil=(time and speed) or proximity
#Min distance between adjacent points in anonymized trajectories [Optional]
pipeline.config.anonymization.samplingRate.min=2
# Max distance between adjacent points in anonymized trajectories [Optional]
pipeline.config.anonymization.samplingRate.max=5
# Unit of measurement for values "min" and "max". Supported unit is "seconds" [Optional]
pipeline.config.anonymization.samplingRate.unit=seconds