Check Data Integrity

The HERE platform offers standard ways to ensure data integrity on partition level. The layer configuration for all layer types supports two optional fields which are Digest (checksum) and CRC.

Note

Digest and CRC are two different things. Digest is used for security to prevent human tampering. CRC is used for safety to prevent bit flips by computer hardware or network transportation. You can use both fields at the same time.

Digest

When retrieving a partition, your application can check if the partition data was not tampered by a human. To do that, your application should calculate the checksum of the data and compare it with the checksum (Digest) contained within the partition. If these two checksum values match, the retrieved data is consistent with the partition that was uploaded. Data Client Library has the algorithms for Digest calculation built in, so your application should use these.

Note that, by default, a partition does not contain a checksum. You need to explicitly choose the checksum algorithm in the layer configuration. If the checksum setting in the layer configuration is not "Undefined" then Data Client Library automatically calculates checksum using the selected algorithm prior to upload of partition.

When choosing a Digest algorithm, consider the following:

  • SHA-256 is recommended for applications where strong data security is required.
  • MD5 and SHA-1 is acceptable when the purpose of applying a hash is to verify data integrity during transit.

CRC

When retrieving a partition, your application can check if one or more bits of the partition data were flipped by computer hardware or network transportation. To do that, your application should calculate the CRC of the data and compare it with the CRC contained within the partition. If these two CRC values match, the retrieved data is consistent with the partition that was uploaded. Data Client Library has the algorithms for CRC calculation built in, so your application should use these.

Note that, by default, a partition does not contain a CRC. You need to explicitly choose the CRC algorithm in the layer configuration. If the CRC setting in the layer configuration is not "None" then the Data Client Library automatically calculates CRC using the selected algorithm prior to upload of partition.

Currently only one CRC algorithm is supported:

  • CRC-32C (Castagnoli), see e.g. CRC algorithms.

    Note that this CRC is padded with zeros to a fixed length of 8 characters. CRC is stored as a string, e.g. if your calculated CRC is the uint32 value of 0x1234af then the CRC which is actually stored in the partition metadata is the string "001234af".

Further Information

For more information about the possible Digest and CRC algorithms, see the Update Catalog of the Data API Reference. Here, you will find configuration information about other layer types as well.

Examples

One can compute a checksum value of a blob and correlate it to a new partition as follows:

Scala
Java
val bufferedBlob: BufferedBlob = ByteArrayData(Array.emptyByteArray)

val writeEngine = DataEngine().writeEngine(hrn)

val blobChecksum: Future[Option[String]] =
  writeEngine.blobChecksum(layerName, bufferedBlob)

val singlePartition = Source
  .fromFuture(blobChecksum)
  .map(checksum => NewPartition("12345", layerName, bufferedBlob, checksum = checksum))

writeEngine.publish(singlePartition)
BufferedBlob bufferedBlob = new ByteArrayData(new byte[] {});

WriteEngine writeEngine = DataEngine.get(myActorSystem).writeEngine(hrn);

CompletionStage<Optional<String>> blobChecksum =
    writeEngine.blobChecksum(layerName, bufferedBlob);

Source<PendingPartition, NotUsed> singlePartition =
    Source.fromCompletionStage(blobChecksum)
        .map(
            checksum ->
                new NewPartition.Builder()
                    .withPartition("12345")
                    .withLayer(layerName)
                    .withData(bufferedBlob)
                    .withChecksum(checksum)
                    .build());

writeEngine.publish(singlePartition);

A partition can be retrieved and its checksum checked with the following code:

Scala
Java
val dataEngine = DataEngine()
val readEngine = dataEngine.readEngine(hrn)
val writeEngine = dataEngine.writeEngine(hrn)

val blobChecksum: Future[Option[String]] = readEngine
  .getDataAsBytes(partition)
  .flatMap(bytes => writeEngine.blobChecksum(layerName, ByteArrayData(bytes)))

blobChecksum.map(checksum => checksum == partition.checksum)
DataEngine dataEngine = DataEngine.get(myActorSystem);
WriteEngine writeEngine = dataEngine.writeEngine(hrn);
ReadEngine readEngine = dataEngine.readEngine(hrn);

CompletionStage<Optional<String>> blobChecksum =
    readEngine
        .getDataAsBytes(partition)
        .thenCompose(bytes -> writeEngine.blobChecksum(layerName, new ByteArrayData(bytes)));

blobChecksum.toCompletableFuture().get().equals(partition.getChecksum());

results matching ""

    No results matching ""