Change Stats Calculator

Overview

Compare what changes have occurred between versions of different layers or the same layer in a report. The Change Stats Calculator can be used to compare two versions of the same GeoJSON layer, or two different GeoJSON layers, and generate a “one-time” report of Added, Removed, and Modified Features.

Additionally, a single layer can be monitored for changes. The pipeline runs whenever partitions are changed “on-version-update” reports.

Structure

Application Flow Diagram
Figure 1. Application Flow Diagram

Legend: Legend Diagram

The State Layer is present in the output catalog and is used by the Change Stats Calculator to store information from previous runs. This layer needs to be present in the output layer as state for this pipeline template to work properly. If you are using the Wizard to create the output catalog and layers, it will be created for you.

Prerequisites

While running the Wizard Deployer, you will be asked to provide the following information. These values are needed in order to properly configure and deploy the pipeline. You should have this information on hand prior to executing the Wizard.

  • Group ID you would like the pipeline to be shared with. This ID must also be shared with all input and output catalogs required for this pipeline template.

  • Pipeline prefix you would like your deployed pipeline to begin with

  • Compare1 catalog HRN, layer ID and version number to be used as reference

  • Compare2 catalog HRN, layer ID and version number to be used as comparison

  • Output catalog HRN or ID and layer ID to provide change stats.
    NOTE: This catalog may be created manually prior to pipeline template deployment or during deployment via the Wizard, refer to Execution – Output Catalog section.

  • Features are a combination of properties, geometry and id. When defining the identity of a feature, you will be asked if geometry and/or id will be used in determining identity, and also what properties (if any) will be used.

    As a database analogy, you can think of the identity definition as the columns that are included in a compound primary key.

    NOTE:

    1. Geometry will be compared exactly. Any change to position coordinates or type will indicate a different identity.
    2. If any property you specify as part of the identity is not present in provided data, features missing this property will be dropped altogether and will not contribute to add/change/remove counts.
    3. If the geometry field in the GeoJSON Feature will be used to determine a changed Feature, geometry will be compared exactly. Any change to position or type will indicate a change to the Feature.
  • Property names of the Feature that will be used to determine a Feature changed in the compared version.

    NOTE:

    1. Properties used for identity CANNOT be used to determine change.
    2. If no properties or geometry are used to indicate changes, only additions and deletions will be reported.

Execution

Output Catalogs

This pipeline template requires a dedicated output catalog with two versioned layers, which can be created by the Wizard or manually. If you are creating the output layer manually, one layer should have content type text/plain and generic partitioning. The other layer is used for state, and should have a Layer ID: state, content type application/x-protobuf, and generic partitioning.

Also, make sure you select the configuration parameters corresponding to the answers to the Wizard.

Comparisons

This pipeline template will ask for Compare1 catalog/layer/version and Compare2 catalog/layer/version. Compare1 will be used as the reference for all Additions/Deletions/Modifications in Compare2.

  • Addition:
    A Feature not present in Compare1 but present in Compare2

  • Deletion:
    A Feature present in Compare1 but not in Compare2

  • Modification:
    A Feature present in Compare1 and Compare2 but the value of ANY provided Change Property has a different value

Properties used for Identity or Comparison will be supplied to the Wizard in a comma separated format. These will be compared exactly for purposes of tracking a Feature or determining change. For example, these two properties "PH_NUMBER": "1847-5882800" and "PH_NUMBER": "1-847-5882800" are different and will be considered as:

  • A different feature, if used as an identity property. This will be tracked as an addition and deletion.
  • A changed feature, if used as a comparison property. This will be tracked as a modification.

Monitored Changes

This pipeline template can be set to monitor a catalog and layer for changes by choosing the on-version-update option while deploying the pipeline template via the Wizard.
To compare the changed data in each version, the user must set the pipeline’s activation options to Schedule on Data Change via the portal or CLI after deployment and initial run.

NOTE: For initial run of a monitored catalog, the report will state that no stats will be calculated.

Verification

In Platform Portal, select the Pipelines tab where you will see your Pipeline deployed and running. After your Pipeline finishes and your data is published, then you can find your output catalog under the Data tab and query/retrieve your data programmatically using one of the following options:

Cost Estimation

Running this pipeline template will incur Platform usage costs associated with:

  • Storage
  • Data IO
  • Compute Core
  • Compute RAM
  • Log Search IO

The options you choose when configuring the pipeline, and your usage patterns, will determine the overall expense. Below are some tips for cost-effective usage:

  • Review the Data User Guide information regarding storage costs to make the appropriate choices for your input/output layers.
  • Monitor your Usage Metrics and adjust (re-deploy) your pipeline if warranted.

Support

If you need support with this pipeline template, please contact us.

results matching ""

    No results matching ""