Use Cases

Real-Time Anonymizer runs on pipelines, which are sequenced processes and functions in which the output of one process is the input of the next. There are three main ways for completing pipeline tasks:

  1. HERE platform portal
  2. Command Line Interface (CLI)
  3. Pipeline API

For further details on the Pipeline API, see the API Reference.

Using any of these approaches, you can accomplish the following use cases:

Create Catalogs

Real-Time Anonymizer requires two streaming layers for the input and output data:

  1. Input streaming layer - This layer will be used as a queue of raw input, non-anonymized, real-time location data.
  2. Output streaming layer - This layer will be used as a queue of output, anonymized, real-time location data.

Normally, these streaming layers should be in separate catalogs.

HERE recommends that separate catalogs are used for raw, input data to Real-Time Anonymizer and anonymized, output data. Separating the input and output data in different catalogs reduces the likelihood of raw, input data being shared externally (on HERE Marketplace).

Steps

  1. Create two HERE platform catalogs (recommended approach) - one for input and one for output data.
  2. Share catalogs - with the necessary HERE platform group and giving read, write, and manage permission.
  3. In each catalog (both input and output), create a new streaming layer with the following configuration options:

  4. Layer Type = Stream

  5. Content Type = application/x-protobuf
  6. Schema = Sensoris Specification 1.0.0 or SDII v3 Schema, Message (supported version is 4.1.0)

Create Pipelines

The process for preparing a new pipeline version for a new pipeline involves the following steps:

  • Create a pipeline
  • Create a template
  • Create a pipeline version

Depending upon the tools used, these steps come either separately or as a single step. In the portal, you can create a pipeline, pipeline template, and first pipeline version all in one single workflow.

In preparation for preparing a pipeline, download the Real-Time Anonymizer template zip archive and extract all of its contents. The zip archive contains the following:

  • .JAR file (used to create the pipeline template)
  • README.md file (provides an overview of the pipeline template)
  • Config (configuration) folder - contains pipeline-config.conf, which details runtime parameters for the pipeline

Create New Pipeline

A new pipeline is created with the following properties:

  • Pipeline in a project - This pipeline can be created within or outside of a project. A project is an access-controlled collection of resources (catalogs, pipelines and schemas).
  • Shared with a group - The HERE platform group from which members are able to access the pipeline. In order for members to be able to run this pipeline, they will additionally need access to the input and output catalogs.
  • Pipeline name and description - details for this specific pipeline, for which pipeline versions can be created.
  • Notification Email - Used to distribute information on outages and service incidents.

Create New Template

A new pipeline template is created with the following properties:

  • Pipeline template name - name for the pipeline template created.
  • Runtime environment - stream 5.0.
  • Pipeline template - new template created by uploading a JAR file (provided in zip archive) or by using an existing pipeline template (previously created with the template JAR file).
  • Pipeline template group - the HERE platform group from which members are able to access the pipeline template.
  • Multi-region support - secondary region available in case of primary region fails.
  • Entry point class name - com.here.platform.extensions.anonymization.stream.AnonymizationStreamingApp.
  • Input and output catalogs - the source and output catalogs for the location data streams.

Create New Version

A new pipeline version is created with the following properties:

  • Version name - unique name of this specific pipeline version.
  • Pipeline template - specific pipeline template to be used.
  • Pipeline ID - specific pipeline for which version will be created.
  • Input and output catalogs - the catalogs specified here will override the catalogs defined for the pipeline template.
  • Cluster configuration - Flink job manager and task manager size must be configured.
  • Runtime parameters - specific configuration to be used for this Real-Time Anonymizer version, including streaming layers and anonymization method. For more details, see Configure Real-Time Anonymizer Version.
  • Cost allocation tag - used for allocating costs of this pipeline version.

Configure Pipelines

A new Real-Time Anonymizer version requires configurations for the following:

  • Streaming layers in input and output catalogs (when using Wizard Deployer, catalog HRN and layer ID are required for access).
  • Use case information including one or more use cases of location data, data format and minimum use case data requirements.
  • Anonymization strategy including anonymization methods and parameters for this method.

This configuration information is provided as runtime parameters when starting the pipeline. Depending on whether the Wizard Deployer or the HERE platform portal and CLI is used for setting up and starting this pipeline, you need to provide the runtime parameters as either a .config file or list of key-value pairs. A anonymization-pipeline.config template file is available in the Pipeline Template zip archive.

Pipeline Configuration

The pipeline configuration allows a specific Kafka Group ID to be given to a specific pipeline version.

Property Name Property Requirement Description
Pipeline Parallelism Optional Number of working nodes for pipeline version. Should be equal to "Number of TaskManagers" when pipeline version is created. Default value is "1"

Note: Pipeline Parallelism

The pipeline parallelism should be configured with the same value in both run-time parameters and the pipelines Task Managers (set on the Pipeline page in HERE Platform portal).

The format of the Pipeline Configuration properties is shown below:

# Number of performing worker nodes. Should be equal to "Number of TaskManagers" set when a pipeline version is created. 
# Default value is "1" [Optional]
pipeline.env.parallelism=1

Data Configuration

The data configuration allows the input and output layers to be specified (when using the HERE platform portal and CLI only).

Note: No Data Configuration for Wizard Deployer

When using the Wizard Deployer, the input and output layers should not be included in the anonymization-pipeline.config file. These details are requested by the Wizard.

Property Name Property Requirement Description
Input layer ID Required Raw input data streaming layer ID (layer must exist in the specified input catalog)
Output layer ID Required Anonymized output data streaming layer ID (layer must exist in the specified output catalog)
Altitude output enabled Optional Enable optional altitude information being included for anonymized output positions. The default value is "false" (altitude of anonymized points will be empty)
Keep extended attributes Optional Most formats, including SENSORIS and SDII, can have extended ("free" or "purpose-specific") attributes for each position. The "true" value will keep these attributes and copy them into anonymized output positions. The default value is "false" (all extended attributes will be ignored)
Chunk Publishing Delay Optional Temporal delay range for publishing anonymized sub-trajectories (reducing risk that a trajectory can be reconstructed). Default values are "0".
Invalid positions excluded Optional Exclude invalid positions (i.e. where position has a timestamp which is significantly in the past or in the future) from the input positions provided for anonymization. The default value is "false" (all positions by default provided to the anonymization process). If the parameter is omitted, invalid positions are provided for anonymization.

This is the format for the input and output data configuration properties:

# Raw input data streaming layer (layer must exist in specified input catalog) [Required]
pipeline.input.layer.id=
# Anonymized output data streaming layer (layer must exist in specified output catalog) [Required]
pipeline.output.layer.id=
# Enables altitude attribute for anonymized output positions. Default value is "false" [Optional]
pipeline.converter.altitudeOutputEnabled=false
# Publish all extended attributes of incoming positions with anonymized output positions. Default value is "false" [Optional] 
pipeline.converter.keepExtendedAttributes=false
# Temporal Delay range for publishing anonymized sub-trajectories (reducing risk that a trajectory can be reconstructed). Default values are "0" [Optional]
pipeline.output.delay.min.seconds=0
pipeline.output.delay.max.seconds=0
# Exclude all positions with invalid timestamp from input data for anonymization. Default value is "false" [Optional] 
pipeline.cleaners.invalidPositionsExcluded=false

Use Case Configuration

The use case configuration allows the anonymization use case, data (format and type) and minimum data requirements to be specified for Real-Time Anonymizer.

Property Name Property Requirement Description
Use case type Required Use case type that anonymization is to be applied for. Supported use case type is TrafficInformation.
Data type Required Data type of input and output data for anonymization. Supported data type is NearRealTime
Data format Required Data format of input and output data. Supported data formats are SENSORIS and SDII
Min. input points count Optional Minimum number of points required in the input trajectory chunk for anonymization to be applied. Value must be greater than 2. Default value is "2".
Min. output points count Optional Minimum number of points required in the output trajectory chunk for anonymization to be applied. Value must be greater than 2. Default value is "2".
Min. input chunk - time Optional Minimum length of a complete trajectory chunk in seconds. Shorter trajectory chunks will be removed as there is a high probability that they are the last chunk of a trajectory (revealing part of trajectory). Value must be greater or equal to zero. Default value is "0"
Data retention time Optional Retention time defines how long information about anonymized sub-trajectory is preserved after anonymized. Default value is 10 mins.

This is the format for the use case configuration properties:

# Use case type that anonymization is to be applied for. Supported use case type is `TrafficInformation` [Required]
pipeline.config.useCase.type=TrafficInformation
# Data type of input and output data for anonymization. Supported data type is `NearRealTime` [Required]
pipeline.config.useCase.dataType=NearRealTime
# Minimum number of points required in an input trajectory chunk for anonymization to be applied. Value must be greater than 2. Default value is "2" [Optional]
pipeline.config.useCase.minInputPointsCount=2
# Minimum number of points required in an output trajectory chunk for anonymization to be applied. Value must be greater than 2. Default value is "2" [Optional]
pipeline.config.useCase.minOutputPointsCount=2
# Minimum length of a complete trajectory chunk in seconds; shorter trajectory chunks will be removed as there is a high probability that they are the last chunk of a trajectory (revealing part of trajectory). Value must be greater or equal to zero. Default value is "0" [Optional]
pipeline.config.useCase.minInputChunkSeconds=0
# Retention time defines how long information about anonymized sub-trajectory is preserved after anonymized. Default value is 10 mins [Optional]
pipeline.config.useCase.retentionTimeMinutes=10

Anonymization Strategy Configuration

The anonymization strategy configuration allows the anonymization method and parameters for this particular method to be set. In the case of Split and Gap anonymization method, this configuration includes start cutting, sub-trajectory length and gap length. The anonymized output data will be in accordance with the configured strategy.

Anonymization Strategy Value Types

In this configuration, range values are widely set as min, max and units for a single parameter. This approach allows for one of the following:

  • Constant values to be set - with the same min and max values defined
  • Random value (within set range) to be used - random value chosen within min and max values

Anonymization of data using random values in the anonymization strategy reduces the privacy risk of this anonymized data. This method makes it harder for an attacker as the exact anonymization pattern is not constant.

Anonymization Strategy Parameters

The anonymization strategy configuration allows you to define the anonymization algorithm and parameters for this algorithm.

Note: Anonymization Strategy Values

Carefully choose the anonymization strategy values and review the output data to ensure that you have achieved an acceptable level of anonymization.

The table below shows the anonymization method values.

Anonymization Strategy Property Property Description Property Type Requirement Description
Anonymization type Anonymization algorithm (SplitAndGap) to be applied to the input, raw data. Type Required
Sub-trajectory size subTrajectorySize is the size of the set of positions that are output from anonymization. Min Required Min. size of anonymized sub-trajectories
Max Required Max. size of anonymized sub-trajectories
Units Required Unit of measurement for subTrajectorySize. Supported unit is "seconds"
Gap Size gapSize is the size of the spaces between sub-trajectories where no positions are removed. Min Required Min. size of gaps between anonymized sub-trajectories
Max Required Max. size of gaps between anonymized sub-trajectories
Units Required Unit of measurement for gapSize. Supported unit is "seconds"
Skip first - time skipFirst.time is the removal of positions at the start of a journey considering travel time. Min Optional Min. duration to be removed at the start of the raw trajectory
Max Optional Min. duration to be removed at the start of the raw trajectory
Units Optional Unit of measurement for values "min" and "max" duration. Supported unit is "seconds"
Skip first - speed skipFirst.speed is the removal of positions at the start of a journey considering speed driven. Min Optional Min. speed of positions to be removed at the start of the raw trajectory
Max Optional Max. speed of positions to be removed at the start of the raw trajectory
Units Optional Unit of measurement for values "min" and "max" speed. Supported unit is "km/h"
Skip first - proximity skipFirst.proximity is the removal of positions at the start of a journey considering distance from start point. Min Optional Min. distance to be removed at the start of the raw trajectory
Max Optional Max. distance to be removed at the start of the raw trajectory
Units Optional Unit of measurement for values "min" and "max" proximity. Supported unit is "meters"
Skip until skipUntil condition allows multiple, skipFirst conditions to be used together as a complex skipUntil rule (conditions include: proximity, speed or time), with supported operators including 'and' and 'or'. Optional for single conditions. Example for multiple conditions, skipUntilskipUntil = (time and speed) or proximity Rule Optional
Sampling rate samplingRate is the time between adjacent points in anonymized output sub-trajectories. Default value is 0 seconds. Min Optional Min. distance between adjacent points in anonymized trajectories
Max Optional Max. distance between adjacent points in anonymized trajectories
Units Optional Unit of measurement for sampling rate "min" and "max" values. Supported unit is "seconds"
Enable stay point obfuscation stayPoint.obfuscation.enabled Enable predicted stay points (by ML algorithm) to be obfuscated from output (anonymized) sub-trajectories. Boolean (true/false) Optional Default value is 'false'
Stay point obfuscation - time stayPoint.obfuscation.gapBeforeSize.seconds removes N seconds of current incoming trajectory chunk before predicted stay point zone Seconds Optional Default value is '0'

Anonymization Strategy Configuration Format

This is a list of the anonymization strategy configuration properties:

# Type of anonymization algorithm [Required]
pipeline.config.anonymization.type=SplitAndGap
# Min size of anonymized trajectories [Required]
pipeline.config.anonymization.subTrajectorySize.min=
# Max size of anonymized trajectories [Required]
pipeline.config.anonymization.subTrajectorySize.max=
# Unit of measurement for "subTrajectorySize". Supported unit is "seconds" [Required]
pipeline.config.anonymization.subTrajectorySize.unit=seconds
# Min size of gaps between anonymized trajectories [Required]
pipeline.config.anonymization.gapSize.min=
# Max size of gaps between anonymized trajectories [Required]
pipeline.config.anonymization.gapSize.max=
# Unit of measurement for values "min" and "max". Supported unit is "seconds" [Required]
pipeline.config.anonymization.gapSize.unit=seconds
# Min duration to be removed at the start of the raw trajectory [Optional]
pipeline.config.anonymization.skipFirst.time.min=
# Max duration to be removed at the start of the raw trajectory [Optional]
pipeline.config.anonymization.skipFirst.time.max=
# Unit of measurement for values "min" and "max" duration. Supported unit is "seconds" [Optional]
pipeline.config.anonymization.skipFirst.time.unit=seconds
# Min speed of positions to be removed at the start of the raw trajectory [Optional]
pipeline.config.anonymization.skipFirst.speed.min=
# Max speed of positions to be removed at the start of the raw trajectory [Optional]
pipeline.config.anonymization.skipFirst.speed.max=
# Unit of measurement for values "min" and "max" speed. Supported unit is "km/h" [Optional]
pipeline.config.anonymization.skipFirst.speed.unit=km/h
# Min distance to be removed at the start of the raw trajectory [Optional]
pipeline.config.anonymization.skipFirst.proximity.min=
# Max distance to be removed at the start of the raw trajectory [Optional]
pipeline.config.anonymization.skipFirst.proximity.max=
# Unit of measurement for values "min" and "max" proximity. Supported unit is "meters" [Optional]
pipeline.config.anonymization.skipFirst.proximity.unit=meters
# 'skipUntil' condition is required when multiple 'skipFirst' conditions are provided (conditions include: proximity, speed or time). Operators supported include 'and' and 'or'. Example 'skipUntil' = `(proximity or speed) and time`. Optional for single conditions. [Optional]
pipeline.config.anonymization.skipFirst.skipUntil=
# Min distance between adjacent points in anonymized trajectories [Optional]
pipeline.config.anonymization.samplingRate.min=0
# Max distance between adjacent points in anonymized trajectories [Optional]
pipeline.config.anonymization.samplingRate.max=0
# Unit of measurement for sampling rate "min" and "max" values. Supported unit is "seconds" [Optional]
pipeline.config.anonymization.samplingRate.unit=seconds
# Enable predicted stay points (by ML algorithm) to be obfuscated from output (anonymized) sub-trajectories. Default value is "false" [Optional]
pipeline.config.anonymization.stayPoint.obfuscation.enabled=false
# If stay point obfuscation is enabled and some stay point zone is detected, 
# the N seconds of current chunk before the first point of that stay point zone will be obfuscated as well. 
# Default value is "0" (means no preliminary gap) [Optional]
pipeline.config.anonymization.stayPoint.obfuscation.gapBeforeSize.seconds=0

Example Real-Time Anonymizer Configuration

You can adjust this example Real-Time Anonymizer configuration to your anonymization requirements.

Note: Anonymization Strategy Values

Carefully choose the anonymization strategy values and review the output data to ensure that you have achieved an acceptable level of anonymization. The example below provides an example of the anonymization method values.

# Pipeline Parameters:

# Number of performing worker nodes. Should be equal to "Number of TaskManagers" set when the pipeline version was created. 
# Default value is "1" [Optional]
pipeline.env.parallelism=1

# Input/Output Data Parameters

# Raw input data Streaming layer (Layer must be included in specified input catalog) [Required]
# ************** Layer ID Must be Added **************
pipeline.input.layer.id=
# Anonymized output data Streaming Layer (Layer must be included in specified output catalog) [Required]
# ************** Layer ID Must be Added **************
pipeline.output.layer.id=
# Enable optional altitude information being included for anonymized output positions. Default value is "false" [Optional]
pipeline.converter.altitudeOutputEnabled=false
# Publish all extended attributes of incoming positions with anonymized output positions. Default value is "false" [Optional] 
pipeline.converter.keepExtendedAttributes=false
# Exclude all positions with invalid timestamp from input data for anonymization. Default value is "false" [Optional] 
pipeline.cleaners.invalidPositionsExcluded=false
# Temporal Delay range for publishing anonymized sub-trajectories (reducing risk that a trajectory can be reconstructed). Default values are "0" [Optional]
pipeline.output.delay.min.seconds=5
pipeline.output.delay.max.seconds=10

# Use Case Parameters:

# Use case type to be used in anonymization algorithm [Required]
pipeline.config.useCase.type=TrafficInformation
# Data type of input and output data for anonymization [Required]
pipeline.config.useCase.dataType=NearRealTime
# Data format of input and output data. Supported data formats are `SENSORIS` and `SDII` [Required]
pipeline.config.useCase.dataFormat=SENSORIS
# Minimum number of points required in the input trajectory chunk for anonymization to be applied. Value must be greater than 2. Default value is "2" [Optional]
pipeline.config.useCase.minInputPointsCount=2
# Minimum number of points required in the input trajectory chunk for anonymization to be applied. Value must be greater than 2. Default value is "2" [Optional]
pipeline.config.useCase.minOutputPointsCount=2
# Minimum length of a complete trajectory chunk in seconds. Shorter trajectory chunks will be removed as there is a high probability that they are the last chunk of a trajectory (revealing part of trajectory). Value must be greater or equal to zero. Default value is "0" [Optional]
pipeline.config.useCase.minInputChunkSeconds=0
# Retention time defines how long information about a trajectory is preserved after the trajectory's chunk is anonymized. Default value is 10 minutes [Optional]
pipeline.config.useCase.retentionTimeMinutes=10

# Anonymization Strategy Parameters:

# Type of anonymization algorithm [Required]
pipeline.config.anonymization.type=SplitAndGap
# Min size of anonymized trajectories [Required]
pipeline.config.anonymization.subTrajectorySize.min=120
# Max size of anonymized trajectories [Required]
pipeline.config.anonymization.subTrajectorySize.max=120
# Unit of measurement for "subTrajectorySize". Supported unit is "seconds" [Required]
pipeline.config.anonymization.subTrajectorySize.unit=seconds
# Min size of gaps between anonymized trajectories [Required]
pipeline.config.anonymization.gapSize.min=40
# Max size of gaps between anonymized trajectories [Required]
pipeline.config.anonymization.gapSize.max=80
# Unit of measurement for values "min" and "max". Supported unit is "seconds" [Required]
pipeline.config.anonymization.gapSize.unit=seconds
# Min amount of data to be removed at the start of the raw trajectory [Optional]
pipeline.config.anonymization.skipFirst.time.min=60
# Max amount of data to be removed at the start of the raw trajectory [Optional]
pipeline.config.anonymization.skipFirst.time.max=70
# Unit of measurement for values "min" and "max". Supported unit is "seconds" [Optional]
pipeline.config.anonymization.skipFirst.time.unit=seconds
# All data with speed value missing or less than the configured value will be removed [Optional]
pipeline.config.anonymization.skipFirst.speed.min=10
# At the start of a raw trajectory, all data with speed value missing or less than the configured value will be removed [Optional]
pipeline.config.anonymization.skipFirst.speed.max=12
# Unit of measurement for values "min" and "max" speed. Supported unit is "km/h" [Optional]
pipeline.config.anonymization.skipFirst.speed.unit=km/h
# At the start of a raw trajectory, all data with speed value missing or less than the configured value will be removed [Optional]
pipeline.config.anonymization.skipFirst.proximity.min=20
# At the start of a raw trajectory, all data with speed value missing or less than the configured value will be removed [Optional]
pipeline.config.anonymization.skipFirst.proximity.max=40
# Unit of measurement for values "min" and "max". Supported unit is "meters" [Optional]
pipeline.config.anonymization.skipFirst.proximity.unit=meters
# 'skipUntil' condition is required when multiple 'skipFirst' conditions are provided (conditions include: proximity, speed or time). Supported operators include 'and' and 'or'. Optional for single conditions. [Optional]
pipeline.config.anonymization.skipFirst.skipUntil=(time and speed) or proximity
#Min distance between adjacent points in anonymized trajectories [Optional]
pipeline.config.anonymization.samplingRate.min=0
# Max distance between adjacent points in anonymized trajectories [Optional]
pipeline.config.anonymization.samplingRate.max=0
# Unit of measurement for values "min" and "max". Supported unit is "seconds" [Optional]
pipeline.config.anonymization.samplingRate.unit=seconds
# Enable predicted stay points (by ML algorithm) to be obfuscated from output (anonymized) sub-trajectories. Default value is "false" [Optional]
pipeline.config.anonymization.stayPoint.obfuscation.enabled=true
# If stay point obfuscation is enabled and some stay point zone is detected, 
# the N seconds of current chunk before the first point of that stay point zone will be obfuscated as well. 
# Default value is "0" (means no preliminary gap) [Optional]
pipeline.config.anonymization.stayPoint.obfuscation.gapBeforeSize.seconds=30

Change Configuration

For Real-Time Anonymizer, your current configuration might need to be changed for the following reasons:

  • Input or output data layers have changed
  • Use case or data formats needing to be handled have changed
  • Anonymization strategy needs to change to achieve expected user anonymity or data utility

To update some parts of the pipeline version configuration, the simplest way is to copy a pipeline version and update only required fields.

The relevant configuration should be changed, as detailed in Configure Real-Time Anonymizer Version.

Start Anonymization

To start anonymizing location data, create and start a Real-Time Anonymizer version. Before starting a new pipeline version, stop all running pipeline versions (for the same Real-Time Anonymizer version). You can easily view the pipeline versions created for Real-Time Anonymizer in the portal.

Note: Single Running Pipeline Version per Pipeline

You can create multiple versions of a single Real-Time Anonymizer with different input data and configurations. However, there can only be one pipeline version running for a single pipeline. Therefore, to start running a new pipeline version, first stop any other running pipeline versions.

The HERE platform portal is one option for activating the Real-Time Anonymizer version. For the required steps, see Pipelines Developer Guide . When activating a pipeline version, you can specify run-time credentials for running this pipeline version. These may be your user credentials or user-generated credentials (apps). For more information, see the Identity & Access Management Guide.

After activating a pipeline version, the pipeline status will switch to Run Pending. When your pipeline has started successfully, the state will change to Operation: Run Succeeded. At this point, your pipeline is now running and any data added (from this point forward only) will start to be processed by Real-Time Anonymizer. A running pipeline version will have new operational choices including the option to Pause the running pipeline version or to Cancel.

If there is a problem when activating a pipeline version and a pipeline version fails to run, a red alert message appears on the portal.

Stop Anonymization

A running pipeline version can be Paused or Cancelled. Cancelling a running pipeline version normally requires submitting the pipeline ID and pipeline version ID along with the Cancel request. Cancelling a pipeline version immediately interrupts and cancel the relevant job with no chance of restarting. After starting a cancellation operation, a Operation: Cancel Pending message appears until cancellation is completed. When cancellation is complete, this pipeline version is then inactive and returns to its Ready state.

Monitoring Pipeline

The Real-Time Anonymizer Pipeline Template outputs a range of Metrics that can be visualized in Grafana, an open source data visualization and monitoring tool provided with HERE Workspace.

For each HERE Platform Pipeline, standard Grafana dashboards and accompanying Ingestion Metrics and Stream Pipeline Metrics are available. In addition to this, the HERE Real-Time Anonymizer Pipeline Template outputs specialist metrics for this Pipeline Template only. These metrics can be utilized through the creation of a custom Grafana Dashboard.

Real-Time Anonymizer Pipeline Template Metrics

To better understand HERE Real-Time Anonymizer Metrics, please review the following concepts:

  • Data Processing Step
  • Data Object
  • Action
  • Filter

Data Processing Step

The data processing steps (in the order that they are applied by the Pipeline) are:

  1. Decoding – reading input data from the stream layer
  2. Cleaning – analyzing and processing input data before anonymization is applied
  3. Anonymization – applying configured anonymization strategy to data output from the cleaning step
  4. Output – preparing and publishing data to the output stream layer

Data Object

Metrics are calculated on the following data objects:

  • Point – a single position with lat / lon / timestamp
  • Chunk – a set of one or more points belonging to the same trajectory. A chunk consists of one or more points.
  • Message – a single data message, which can contain one or more chunks that could belong to different trajectories
  • Trajectory – a single journey with a consistent ID. A journey consists of one or more chunks.

Action

Metrics are captured for the following actions:

  • Dropped – the data object is discarded and does not proceed for next processing step
  • Info – count the general quantity of data objects on the defied data processing step
  • Notify – the data object is preserved, and the count quantity of data objects is filtered by specific criteria (i.e. 'too few positions') on the defined data processing step.

Filter

Metrics can be applied for the following types of filter:

  • All – no filter is applied; the total count for all data objects (without any filter applied)
  • <specific filter> - count for all data objects meeting human-readable filters (i.e. has_too_few_points)

Available Real-Time Anonymizer Metrics

Data Processing Step Metric Name Data Object Description
Decoding RTA_decoding_message_dropped_corrupted Message Number of received messages that could not be decoded
Decoding RTA_decoding_point_info_all Point Number of points received and successfully decoded
Decoding RTA_decoding_chunk_info_all Chunk Number of chunks received and successfully decoded
Cleaning RTA_cleaning_point_notify_chunk_too_short Point Number of points where chunk duration is less than the minimum duration
Cleaning RTA_cleaning_point_notify_chunk_has_too_few_points Point Number of points where chunk has fewer points than the configured threshold
Cleaning RTA_cleaning_point_dropped_invalid_timestamp Point Number of points dropped as timestamps are not recent or are in the future
Cleaning RTA_cleaning_point_notify_speed_too_high Point Number of points having a speed greater than normal driving speeds
Anonymization RTA_point_anonymization_dropped_invalid_timestamp_trajectory_state Point Number of points that were dropped because chunks were received in wrong temporal order
Anonymization RTA_point_anonymization_dropped_sampling_rate Point Number of points dropped to adjust the sampling rate to the configured value
Anonymization RTA_point_anonymization_dropped_start Point Number of points dropped by start cutting strategy
Anonymization RTA_point_anonymization_dropped_gap Point Number of points dropped to create a gap in the data
Anonymization RTA_point_anonymization_dropped_stay_point Point Number of points dropped to obfuscate a stay point
Anonymization RTA_point_anonymization_dropped_chunk_has_too_few_points Point Number of points dropped as number of points in output chunk not meeting threshold
Anonymization RTA_anonymization_trajectory_info_all Trajectory Number of trajectories reaching the anonymization step
Output RTA_output_point_info_all Point Number of points output by the pipeline
Output RTA_output_chunk_info_all Chunk Number of chunks output by the pipeline

results matching ""

    No results matching ""