Configuration

The Index Compaction Library is configured via Typesafe Config. You should provide an application.conf file that contains all the application-specific settings that differ from the defaults provided by the reference.conf file in the Index Compaction Library.

Note

HERE Workspace does not allow the same catalog to be used as both input and output for batch pipelines. For Index Compaction Library, input and output catalog are the same as the library compacts the same index layer. You should specify the desired catalog to be compacted under the input-catalogs setting. For the output-catalog setting, you still need to pass a valid catalog. You can use a catalog with zero layers. You can choose to maintain the output catalog for future compaction jobs.

Note

The property here.platform.index-compaction.failure-rate-threshold determines whether your compaction job status returns as completed or failed based on provided value. If left at default 1.0 value and all compaction events fail, the pipeline will fail. If user-provided query returns no indexes, pipeline will pass.

The Index Compaction Library relies on the Data Client Library to access the HERE platform, so you can specify Data Client Library settings like Set Up a Proxy and Retry policy. Additionally, you can add configuration settings as needed. For more information, see Data Client Configuration.

User defined properties
Index Compaction Library default properties
Pipeline Input and Output properties
######################################################
## Index Compaction Library Application Config File ##
######################################################

###########################################
## Required settings in application.conf ##
###########################################

# These settings are for Index Compaction Library's Compaction Job.
here.platform.index-compaction {

  # Fully Qualified Class Name of 'CompactionUDF' interface implementation provided by the user.
  # The class must be public and have a public constructor.
  udf = "<UPDATE_ME>"    # Eg. "com.here.platform.index.compaction.batch.ParquetCompactionExample"

  # Index Layer ID whose data is to be compacted.
  layer = "<UPDATE_ME>"    # Eg. "index"

  query {
    # For compaction, an entire index layer can be queried or a slice of an index layer based on timewindow, heretile, etc. can be queried.
    # This property expects query constraint in RSQL format.
    # Note that the compaction pipeline needs to be configured with appropriate resources so it can handle amount of data queried from index layer.
    constraint = "<UPDATE_ME>"    # Eg. "tileId==78499;eventType==SignRecognition;ingestionTime==1588800000"
  }
}

###########################################
## Optional settings in application.conf ##
###########################################

# These settings are for Index Compaction Library's Compaction Job.
here.platform.index-compaction {

  # These settings are for configuring the size of indexed files and compacted files.
  file-size {
    # The minimum acceptable byte size of an indexed file.
    # If there are multiple indexed files having the same indexing attributes with size smaller than this value,
    # then those files will be re-indexed in a single file.
    min = 134217728     # 128 MB

    # The maximum acceptable byte size of a compacted file.
    # If the total size of multiple indexed files having the same indexing attributes exceeds this max file-size parameter,
    # then the files will be compacted into more than one file with max file-size being honored.
    max = 268435456    # 256 MB
  }
}

# These settings are for Index Compaction Library Spark Job's performance tuning.
# Options set here are automatically propagated to the Hadoop configuration during I/O.
spark.session.runtime.config {

  # This property can be used to tune the performance of spark sql query execution.
  # It configures the number of partitions to use when shuffling data for joins or aggregations.
  spark.sql.shuffle.partitions = 200
}
####################################################
## Index Compaction Library Reference Config File ##
####################################################

###############################################################################################################################
## This is the reference config file that contains default settings for Index Compaction Library.                            ##
## Any application-specific settings that differ from the default ones provided here should be set in your application.conf. ##
###############################################################################################################################

# These settings are for Index Compaction Library's Compaction Job.
here.platform.index-compaction {

  # These settings are for configuring the size of indexed files and compacted files.
  file-size {
    # The minimum acceptable byte size of an indexed file.
    # If there are multiple indexed files having the same indexing attributes with size smaller than this value,
    # then those files will be re-indexed in a single file.
    min = 134217728     # 128 MB

    # The maximum acceptable byte size of a compacted file.
    # If the total size of multiple indexed files having the same indexing attributes exceeds this max file-size parameter,
    # then the files will be compacted into more than one file with max file-size being honored.
    max = 268435456    # 256 MB
  }

  # This property determines whether your compaction job status returns as completed or failed based on provided value.
  # Failure rate is calculated as failed count of compactions divided by total number of compactions.
  # Valid values are in the range 0.0 to 1.0 inclusive.
  # If left at default 1.0 value and all compaction events fail, the pipeline will fail. If user-provided query returns no indexes, pipeline will pass.
  failure-rate-threshold = 1.0
}
########################################################################################################################
# The HERE platform does not allow the same catalog to be used as both input and output for batch pipelines.           #
#                                                                                                                      #
# For Index Compaction Library, input & output catalog are the same as the library compacts the same index layer.      #
# You should specify the desired catalog to be compacted under the `input-catalogs` setting.                           #
#                                                                                                                      #
# For the `output-catalog` setting, you still need to pass a valid catalog. You can use a catalog with zero layers.    #
# You can choose to maintain the output catalog for future compaction jobs.                                            #
########################################################################################################################
pipeline.config {
  output-catalog {
    hrn = "YOUR_OUTPUT_CATALOG_HRN"   # E.g. "hrn:here:data::olp-here:index-compaction-library-empty-catalog"
  }
  input-catalogs {
    source {
      hrn = "YOUR_INPUT_CATALOG_HRN"  # E.g. "hrn:here:data::olp-here:index-compaction-library-input-catalog"
    }
  }
}

results matching ""

    No results matching ""