Real-Time Anonymizer anonymizes location data for one or more use cases. Different use cases require different output data. Hence, you can use and optimize the available anonymization methods to provide the maximum utility for a specific use case while privacy is protected. For example, the
TrafficInformation use case requires sets of continuous probe points containing no major gaps (within the set of probe points). For this use case, the
SplitAndGap anonymization method is suitable and the method is optimized to provide as many probe points as possible within all anonymized sets of probe points.
Raw location data is the input for anonymization. Different use cases have different data latency requirements. Real-time use cases (for real-time traffic information) require that anonymized data is provided with a maximum latency of a couple of minutes. In this case, latency is the length of time (in minutes) from when data was collected by the vehicle to when data is processed and provided as a service (that is traffic information served to vehicles). Latency requirements for different use cases determine whether a complete journey (represented as a single trajectory, or a subset of the most recent points from a journey (chunk of a trajectory), can be anonymized in one go. Real-Time Anonymizer works with real-time data only.
Anonymization is applied on real-time data, when a short latency is required. Real-time location data is a set of the most recent probe points and/or events from a single user journey covering a specific time period (between t=2 minutes and t=3 minutes). This set of probe points and events from a specific time period of a journey is a trajectory chunk. Trajectory chunks can cover different time periods for different journeys and chunks of the same journey. The significant point is that with real-time data, additional chunks of a single journey are added over time. Normally, data for a whole journey will not be input at a single point in time. Data for the whole journey will instead be input as an ordered set of chunks. Chunks of a given trajectory should share the same identifier (called hereafter "trajectory identifier") so that the pipeline is able to treat them as a unique sequence. It is acceptable to use the same identifier for multiple trajectories of the same vehicle. The vehicle identifier itself can be used as trajectory identifier, with the anonymization pipeline protecting privacy of the vehicle owner in this case. Real-time use cases include hazard warning and traffic information.
When real-time location data is provided for anonymization, trajectory chunk data must have a consistent ID across all chunks belonging to the same trajectory. When this rule is not adhered to, each trajectory chunk is anonymized as a new trajectory and could have either no probe points anonymized or all probe points removed by start cutting. Remember to
verify that trajectory IDs are being consistently used across trajectory chunks.
When data is ingested into an input streaming layer, a couple of additional points should be considered, in order to reduce the amount of data excluded from anonymization:
Chunks of a trajectory that are not provided in the correct temporal order will result in trajectory chunks being excluded from anonymization and therefore will affect the volume of data output by the anonymization pipeline. When chunks of the same trajectory are not ingested to the same partition, some messages will be lost upon pipeline restart.
Raw location data trajectories can reveal sensitive information about persons. Anonymization methods are used to reduce this risk by removing or editing the information. The output data then differs from the input data in various ways, such as:
The split and gap anonymization method is an anonymization approach that works on real-time chunks, applying an anonymization strategy across all chunks of a single journey. The input trajectory chunk is split into zero or more anonymized sub-trajectories depending on the configuration of the anonymization strategy. Supported parameters include the following:
The output of this anonymization method is zero or more sub-trajectories, which are output as SENSORIS or SDII data messages to the output streaming layer. Each anonymized sub-trajectory has a new, random identifier not linked to the original trajectory ID.