Use Cases
Real-Time Anonymizer runs on pipelines, which are sequenced processes and functions in which the output of one process is the input of the next. There are three main ways for completing pipeline tasks:
- HERE platform portal
- Command Line Interface (CLI)
- Pipeline API
For further details on the Pipeline API, see the API Reference .
Using any of these approaches, you can accomplish the following use cases:
Create Catalogs
Real-Time Anonymizer requires two streaming layers for the input and output data:
- Input streaming layer - This layer will be used as a queue of raw input, non-anonymized, real-time location data.
- Output streaming layer - This layer will be used as a queue of output, anonymized, real-time location data.
Normally, these streaming layers should be in separate catalogs.
Note: Separate Catalogs Recommended
HERE recommends that separate catalogs are used for raw, input data to Real-Time Anonymizer and anonymized, output data. Separating the input and output data in different catalogs reduces the likelihood of raw, input data being shared externally (on HERE Marketplace).
Steps
- Create two HERE platform catalogs (recommended approach) - one for input and one for output data.
- Share catalogs - with the necessary HERE platform group and giving
read
, write
, and manage
permission. -
In each catalog (both input and output) , create a new streaming layer with the following configuration options:
-
Layer Type = Stream
Content Type = application/x-protobuf
Schema = Sensoris Specification 1.0.0 or SDII v3 Schema, Message (supported version is 4.1.0)
Create Pipelines
The process for preparing a new pipeline version for a new pipeline involves the following steps:
- Create a pipeline
- Create a template
- Create a pipeline version
Depending upon the tools used, these steps come either separately or as a single step. In the portal, you can create a pipeline, pipeline template, and first pipeline version all in one single workflow.
In preparation for preparing a pipeline, download the Real-Time Anonymizer template zip archive and extract all of its contents. The zip archive contains the following:
-
.JAR
file (used to create the pipeline template) -
README.md
file (provides an overview of the pipeline template) - Config (configuration) folder - contains
pipeline-config.conf
, which details runtime parameters for the pipeline
Create New Pipeline
A new pipeline is created with the following properties:
- Pipeline in a project - This pipeline can be created within or outside of a project. A project is an access-controlled collection of resources (catalogs, pipelines and schemas).
- Shared with a group - The HERE platform group from which members are able to access the pipeline. In order for members to be able to run this pipeline, they will additionally need access to the input and output catalogs.
- Pipeline name and description - details for this specific pipeline, for which pipeline versions can be created.
- Notification Email - Used to distribute information on outages and service incidents.
Create New Template
A new pipeline template is created with the following properties:
- Pipeline template name - name for the pipeline template created.
- Runtime environment - stream 5.0.
- Pipeline template - new template created by uploading a JAR file (provided in zip archive) or by using an existing pipeline template (previously created with the template JAR file).
- Pipeline template group - the HERE platform group from which members are able to access the pipeline template.
- Multi-region support - secondary region available in case of primary region fails.
- Entry point class name
-
com.here.platform.extensions.anonymization.stream.AnonymizationStreamingApp
.
- Input and output catalogs - the source and output catalogs for the location data streams.
Create New Version
A new pipeline version is created with the following properties:
- Version name - unique name of this specific pipeline version.
- Pipeline template - specific pipeline template to be used.
- Pipeline ID - specific pipeline for which version will be created.
- Input and output catalogs - the catalogs specified here will override the catalogs defined for the pipeline template.
- Cluster configuration - Flink job manager and task manager size must be configured.
- Runtime parameters - specific configuration to be used for this Real-Time Anonymizer version, including streaming layers and anonymization method. For more details, see Configure Real-Time Anonymizer Version.
- Cost allocation tag - used for allocating costs of this pipeline version.
A new Real-Time Anonymizer version requires configurations for the following:
- Streaming layers in input and output catalogs (when using Wizard Deployer, catalog HRN and layer ID are required for access).
- Use case information with the minimum use case data requirements.
- Anonymization strategy including anonymization methods and parameters for this method.
This configuration information is provided as runtime parameters when starting the pipeline. Depending on whether the Wizard Deployer or the HERE platform portal and CLI is used for setting up and starting this pipeline, you need to provide the runtime parameters as either a .config
file or list of key-value pairs. A anonymization-pipeline.config
template file is available in the Pipeline Template
zip archive.
Pipeline Configuration
The pipeline configuration allows a specific Kafka Group ID to be given to a specific pipeline version.
Property Name | Property Requirement | Description |
Pipeline Parallelism | Optional | Number of working nodes for pipeline version. Should be equal to "Number of TaskManagers" when pipeline version is created. Default value is "1" |
Note: Pipeline Parallelism
The pipeline parallelism should be configured with the same value in both run-time parameters and the pipelines Task Managers (set on the Pipeline page in HERE Platform portal).
The format of the Pipeline Configuration properties is shown below:
pipeline.env.parallelism=1
Data Configuration
The data configuration allows the input and output layers to be specified (when using the HERE platform portal and CLI only).
Note: No Data Configuration for Wizard Deployer
When using the Wizard Deployer, the input and output layers should not be included in the anonymization-pipeline.config
file. These details are requested by the Wizard.
Property Name | Property Requirement | Description |
Input layer ID | Required | Raw input data streaming layer ID (layer must exist in the specified input catalog) |
Output layer ID | Required | Anonymized output data streaming layer ID (layer must exist in the specified output catalog) |
Altitude output enabled | Optional | Enable optional altitude information being included for anonymized output positions. The default value is "false" (altitude of anonymized points will be empty) |
Keep extended attributes | Optional | Most formats, including SENSORIS and SDII, can have extended ("free" or "purpose-specific") attributes for each position. The "true" value will keep these attributes and copy them into anonymized output positions. The default value is "false" (all extended attributes will be ignored) |
Chunk Publishing Delay | Optional | Temporal delay range for publishing anonymized sub-trajectories (reducing risk that a trajectory can be reconstructed). Default values are "0". |
Invalid positions excluded | Optional | Exclude invalid positions (i.e. where position has a timestamp which is significantly in the past or in the future) from the input positions provided for anonymization. The default value is "false" (all positions by default provided to the anonymization process). If the parameter is omitted, invalid positions are provided for anonymization. |
This is the format for the input and output data configuration properties:
pipeline.input.layer.id=
pipeline.output.layer.id=
pipeline.converter.altitudeOutputEnabled=false
pipeline.converter.keepExtendedAttributes=false
pipeline.converter.outputProviderName=RTA_PIPELINE
pipeline.output.delay.min.seconds=0
pipeline.output.delay.max.seconds=0
pipeline.cleaners.invalidPositionsExcluded=false
Use Case Configuration
The use case configuration allows the anonymization use case, data (format and type) and minimum data requirements to be specified for Real-Time Anonymizer.
Property Name | Property Requirement | Description |
Use case type | Required | Use case type that anonymization is to be applied for. Supported use case type is TrafficInformation . |
Data type | Required | Data type of input and output data for anonymization. Supported data type is NearRealTime |
Min. input points count | Optional | Minimum number of points required in the input trajectory chunk for anonymization to be applied. Value must be greater than 2. Default value is "2". |
Min. output points count | Optional | Minimum number of points required in the output trajectory chunk for anonymization to be applied. Value must be greater than 2. Default value is "2". |
Min. input chunk - time | Optional | Minimum length of a complete trajectory chunk in seconds. Shorter trajectory chunks will be removed as there is a high probability that they are the last chunk of a trajectory (revealing part of trajectory). Value must be greater or equal to zero. Default value is "0" |
Data retention time | Optional | Retention time defines how long information about anonymized sub-trajectory is preserved after anonymized. Default value is 10 mins. |
This is the format for the use case configuration properties:
pipeline.config.useCase.type=TrafficInformation
pipeline.config.useCase.dataType=NearRealTime
pipeline.config.useCase.minInputPointsCount=2
pipeline.config.useCase.minOutputPointsCount=2
pipeline.config.useCase.minInputChunkSeconds=0
pipeline.config.useCase.retentionTimeMinutes=10
Anonymization Strategy Configuration
The anonymization strategy configuration allows the anonymization method and parameters for this particular method to be set. In the case of Split and Gap anonymization method, this configuration includes start cutting, sub-trajectory length and gap length. The anonymized output data will be in accordance with the configured strategy.
Anonymization Strategy Value Types
In this configuration, range values are widely set as min
, max
and units for a single parameter. This approach allows for one of the following:
- Constant values to be set - with the same
min
and max
values defined - Random value (within set range) to be used - random value chosen within
min
and max
values
Anonymization of data using random values in the anonymization strategy reduces the privacy risk of this anonymized data. This method makes it harder for an attacker as the exact anonymization pattern is not constant.
Anonymization Strategy Parameters
The anonymization strategy configuration allows you to define the anonymization algorithm and parameters for this algorithm.
Note: Anonymization Strategy Values
Carefully choose the anonymization strategy values and review the output data to ensure that you have achieved an acceptable level of anonymization.
The table below shows the anonymization method values.
Anonymization Strategy Property | Property Description | Property Type | Requirement | Description |
Anonymization type | Anonymization algorithm (SplitAndGap ) to be applied to the input, raw data. | Type | Required | |
Sub-trajectory size | subTrajectorySize is the size of the set of positions that are output from anonymization. | Min | Required | Min. size of anonymized sub-trajectories |
Max | Required | Max. size of anonymized sub-trajectories |
Units | Required | Unit of measurement for subTrajectorySize . Supported unit is "seconds" |
Gap Size | gapSize is the size of the spaces between sub-trajectories where no positions are removed. | Min | Required | Min. size of gaps between anonymized sub-trajectories |
Max | Required | Max. size of gaps between anonymized sub-trajectories |
Units | Required | Unit of measurement for gapSize . Supported unit is "seconds" |
Skip first - time | skipFirst.time is the removal of positions at the start of a journey considering travel time. | Min | Optional | Min. duration to be removed at the start of the raw trajectory |
Max | Optional | Min. duration to be removed at the start of the raw trajectory |
Units | Optional | Unit of measurement for values "min" and "max" duration. Supported unit is "seconds" |
Skip first - speed | skipFirst.speed is the removal of positions at the start of a journey considering speed driven. | Min | Optional | Min. speed of positions to be removed at the start of the raw trajectory |
Max | Optional | Max. speed of positions to be removed at the start of the raw trajectory |
Units | Optional | Unit of measurement for values "min" and "max" speed. Supported unit is "km/h" |
Skip first - proximity | skipFirst.proximity is the removal of positions at the start of a journey considering distance from start point. | Min | Optional | Min. distance to be removed at the start of the raw trajectory |
Max | Optional | Max. distance to be removed at the start of the raw trajectory |
Units | Optional | Unit of measurement for values "min" and "max" proximity. Supported unit is "meters" |
Skip until | skipUntil condition allows multiple, skipFirst conditions to be used together as a complex skipUntil rule (conditions include: proximity, speed or time), with supported operators including 'and' and 'or'. Optional for single conditions. Example for multiple conditions, skipUntil skipUntil = (time and speed) or proximity | Rule | Optional | |
Sampling rate | samplingRate is the time between adjacent points in anonymized output sub-trajectories. Default value is 0 seconds. | Min | Optional | Min. distance between adjacent points in anonymized trajectories |
Max | Optional | Max. distance between adjacent points in anonymized trajectories |
Units | Optional | Unit of measurement for sampling rate "min" and "max" values. Supported unit is "seconds" |
Enable stay point obfuscation | stayPoint.obfuscation.enabled Enable predicted stay points (by ML algorithm) to be obfuscated from output (anonymized) sub-trajectories. | Boolean (true/false) | Optional | Default value is 'false' |
Stay point obfuscation - time | stayPoint.obfuscation.gapBeforeSize.seconds removes N seconds of current incoming trajectory chunk before predicted stay point zone | Seconds | Optional | Default value is '0' |
This is a list of the anonymization strategy configuration properties:
pipeline.config.anonymization.type=SplitAndGap
pipeline.config.anonymization.subTrajectorySize.min=
pipeline.config.anonymization.subTrajectorySize.max=
pipeline.config.anonymization.subTrajectorySize.unit=seconds
pipeline.config.anonymization.gapSize.min=
pipeline.config.anonymization.gapSize.max=
pipeline.config.anonymization.gapSize.unit=seconds
pipeline.config.anonymization.skipFirst.time.min=
pipeline.config.anonymization.skipFirst.time.max=
pipeline.config.anonymization.skipFirst.time.unit=seconds
pipeline.config.anonymization.skipFirst.speed.min=
pipeline.config.anonymization.skipFirst.speed.max=
pipeline.config.anonymization.skipFirst.speed.unit=km/h
pipeline.config.anonymization.skipFirst.proximity.min=
pipeline.config.anonymization.skipFirst.proximity.max=
pipeline.config.anonymization.skipFirst.proximity.unit=meters
pipeline.config.anonymization.skipFirst.skipUntil=
pipeline.config.anonymization.samplingRate.min=0
pipeline.config.anonymization.samplingRate.max=0
pipeline.config.anonymization.samplingRate.unit=seconds
pipeline.config.anonymization.stayPoint.obfuscation.enabled=false
pipeline.config.anonymization.stayPoint.obfuscation.gapBeforeSize.seconds=0
Example Real-Time Anonymizer Configuration
You can adjust this example Real-Time Anonymizer configuration to your anonymization requirements.
Note: Anonymization Strategy Values
Carefully choose the anonymization strategy values and review the output data to ensure that you have achieved an acceptable level of anonymization. The example below provides an example of the anonymization method values.
pipeline.env.parallelism=1
pipeline.input.layer.id=
pipeline.input.override.dataFormat=
pipeline.output.layer.id=
pipeline.output.override.dataFormat=
pipeline.converter.altitudeOutputEnabled=false
pipeline.converter.keepExtendedAttributes=false
pipeline.converter.outputProviderName=RTA_PIPELINE
pipeline.cleaners.invalidPositionsExcluded=false
pipeline.output.delay.min.seconds=5
pipeline.output.delay.max.seconds=10
pipeline.config.useCase.type=TrafficInformation
pipeline.config.useCase.dataType=NearRealTime
pipeline.config.useCase.minInputPointsCount=2
pipeline.config.useCase.minOutputPointsCount=2
pipeline.config.useCase.minInputChunkSeconds=0
pipeline.config.useCase.retentionTimeMinutes=10
pipeline.config.anonymization.type=SplitAndGap
pipeline.config.anonymization.subTrajectorySize.min=120
pipeline.config.anonymization.subTrajectorySize.max=120
pipeline.config.anonymization.subTrajectorySize.unit=seconds
pipeline.config.anonymization.gapSize.min=40
pipeline.config.anonymization.gapSize.max=80
pipeline.config.anonymization.gapSize.unit=seconds
pipeline.config.anonymization.skipFirst.time.min=60
pipeline.config.anonymization.skipFirst.time.max=70
pipeline.config.anonymization.skipFirst.time.unit=seconds
pipeline.config.anonymization.skipFirst.speed.min=10
pipeline.config.anonymization.skipFirst.speed.max=12
pipeline.config.anonymization.skipFirst.speed.unit=km/h
pipeline.config.anonymization.skipFirst.proximity.min=20
pipeline.config.anonymization.skipFirst.proximity.max=40
pipeline.config.anonymization.skipFirst.proximity.unit=meters
pipeline.config.anonymization.skipFirst.skipUntil=(time and speed) or proximity
pipeline.config.anonymization.samplingRate.min=0
pipeline.config.anonymization.samplingRate.max=0
pipeline.config.anonymization.samplingRate.unit=seconds
pipeline.config.anonymization.stayPoint.obfuscation.enabled=true
pipeline.config.anonymization.stayPoint.obfuscation.gapBeforeSize.seconds=30
Change Configuration
For Real-Time Anonymizer, your current configuration might need to be changed for the following reasons:
- Input or output data layers have changed
- Use case have changed
- Anonymization strategy needs to change to achieve expected user anonymity or data utility
To update some parts of the pipeline version configuration, the simplest way is to copy a pipeline version and update only required fields.
The relevant configuration should be changed, as detailed in Configure Real-Time Anonymizer Version.
Start Anonymization
To start anonymizing location data, create and start a Real-Time Anonymizer version. Before starting a new pipeline version, stop all running pipeline versions (for the same Real-Time Anonymizer version). You can easily view the pipeline versions created for Real-Time Anonymizer in the portal.
Note: Single Running Pipeline Version per Pipeline
You can create multiple versions of a single Real-Time Anonymizer with different input data and configurations. However, there can only be one pipeline version running for a single pipeline. Therefore, to start running a new pipeline version, first stop any other running pipeline versions.
The HERE platform portal is one option for activating the Real-Time Anonymizer version. For the required steps, see Pipelines Developer Guide . When activating a pipeline version, you can specify run-time credentials for running this pipeline version. These may be your user credentials or user-generated credentials (apps). For more information, see the Identity & Access Management Guide .
After activating a pipeline version, the pipeline status will switch to Run Pending
. When your pipeline has started successfully, the state will change to Operation: Run Succeeded
. At this point, your pipeline is now running and any data added (from this point forward only) will start to be processed by Real-Time Anonymizer. A running pipeline version will have new operational choices including the option to Pause
the running pipeline version or to Cancel
.
If there is a problem when activating a pipeline version and a pipeline version fails to run , a red alert message appears on the portal.
Stop Anonymization
A running pipeline version can be Paused
or Cancelled
. Cancelling a running pipeline version normally requires submitting the pipeline ID and pipeline version ID along with the Cancel
request. Cancelling a pipeline version immediately interrupts and cancel the relevant job with no chance of restarting. After starting a cancellation operation, a Operation: Cancel Pending
message appears until cancellation is completed. When cancellation is complete, this pipeline version is then inactive and returns to its Ready
state.
Monitoring Pipeline
The Real-Time Anonymizer Pipeline Template outputs a range of Metrics that can be visualized in Grafana, an open source data visualization and monitoring tool provided with HERE Workspace.
For each HERE Platform Pipeline, standard Grafana dashboards and accompanying Ingestion Metrics and Stream Pipeline Metrics are available. In addition to this, the HERE Real-Time Anonymizer Pipeline Template outputs specialist metrics for this Pipeline Template only. These metrics can be utilized through the creation of a custom Grafana Dashboard .
Real-Time Anonymizer Pipeline Template Metrics
To better understand HERE Real-Time Anonymizer Metrics, please review the following concepts:
- Data Processing Step
- Data Object
- Action
- Filter
Data Processing Step
The data processing steps (in the order that they are applied by the Pipeline) are:
-
Decoding
– reading input data from the stream layer -
Cleaning
– analyzing and processing input data before anonymization is applied -
Anonymization
– applying configured anonymization strategy to data output from the cleaning step -
Output
– preparing and publishing data to the output stream layer
Data Object
Metrics are calculated on the following data objects:
-
Point
– a single position with lat / lon / timestamp -
Chunk
– a set of one or more points belonging to the same trajectory. A chunk consists of one or more points. -
Message
– a single data message, which can contain one or more chunks that could belong to different trajectories -
Trajectory
– a single journey with a consistent ID. A journey consists of one or more chunks.
Action
Metrics are captured for the following actions:
-
Dropped
– the data object is discarded and does not proceed for next processing step -
Info
– count the general quantity of data objects on the defied data processing step -
Notify
– the data object is preserved, and the count quantity of data objects is filtered by specific criteria (i.e. 'too few positions') on the defined data processing step.
Filter
Metrics can be applied for the following types of filter:
-
All
– no filter is applied; the total count for all data objects (without any filter applied) -
<specific filter>
- count for all data objects meeting human-readable filters (i.e. has_too_few_points)
Available Real-Time Anonymizer Metrics
Data Processing Step | Metric Name | Data Object | Description |
Decoding | RTA_decoding_message_dropped_corrupted | Message | Number of received messages that could not be decoded |
Decoding | RTA_decoding_point_info_all | Point | Number of points received and successfully decoded |
Decoding | RTA_decoding_chunk_info_all | Chunk | Number of chunks received and successfully decoded |
Cleaning | RTA_cleaning_point_notify_chunk_too_short | Point | Number of points where chunk duration is less than the minimum duration |
Cleaning | RTA_cleaning_point_notify_chunk_has_too_few_points | Point | Number of points where chunk has fewer points than the configured threshold |
Cleaning | RTA_cleaning_point_dropped_invalid_timestamp | Point | Number of points dropped as timestamps are not recent or are in the future |
Cleaning | RTA_cleaning_point_notify_speed_too_high | Point | Number of points having a speed greater than normal driving speeds |
Anonymization | RTA_point_anonymization_dropped_invalid_timestamp_trajectory_state | Point | Number of points that were dropped because chunks were received in wrong temporal order |
Anonymization | RTA_point_anonymization_dropped_sampling_rate | Point | Number of points dropped to adjust the sampling rate to the configured value |
Anonymization | RTA_point_anonymization_dropped_start | Point | Number of points dropped by start cutting strategy |
Anonymization | RTA_point_anonymization_dropped_gap | Point | Number of points dropped to create a gap in the data |
Anonymization | RTA_point_anonymization_dropped_stay_point | Point | Number of points dropped to obfuscate a stay point |
Anonymization | RTA_point_anonymization_dropped_chunk_has_too_few_points | Point | Number of points dropped as number of points in output chunk not meeting threshold |
Anonymization | RTA_anonymization_trajectory_info_all | Trajectory | Number of trajectories reaching the anonymization step |
Output | RTA_output_point_info_all | Point | Number of points output by the pipeline |
Output | RTA_output_chunk_info_all | Chunk | Number of chunks output by the pipeline |