Configuration File Reference

Contents

Overview

This topic is a summary of the various configuration files used in HERE platform pipelines. Most are only used in certain places, such as with the CLI, and are identified accordingly. In some instances, a configuration file may only be used in specific use cases, which are also described here.

Note

With rare exceptions, there are no configuration files needed when directly using the Pipeline API. Most of the parameters encountered in these configuration files are parameters required by various API functions. Both the CLI and the platform portal simplify working with pipelines, but the data is still required by the Pipeline API. So, we use configuration files with the SDK and CLI, and we use forms with the platform portal.

pipeline-job.conf

CLI only

This file only applies to activating batch Pipeline Versions without the use of the scheduler (that is, in a Run Now mode).

Example content of pipeline-job.conf:

    pipeline.job.catalog-versions {
        output-catalog { base-version = 42 }
        input-catalogs {
            test-input-1 {
                processing-type = "no_changes"
                version = 19
                }
            test-input-2 {
                processing-type = "changes"
                since-version = 70
                version = 75
                }
            test-input-3 {
                processing-type = "reprocess"
                version = 314159
            }
        }
    }

Where:

  • base-version of output catalog indicates the already existing version of the catalog on top of which new data should be published. This parameter will not be necessary in the future and will be removed. It is currently required to commit to the output catalog updated data.
  • input-catalogs contains, for each input source, which is the most up-to-date version. This is the version that should be processed. In addition, information that specifies what has changed since the last time the job ran is also included (see the example above). Catalogs can be distinguished via the same identifiers present in the pipeline configuration file.
  • processing-type describes what has changed in each input since the last successful run. Possible values include no_changes, changes, and reprocess.

    • no_changes indicates that an input catalog has not changed since the last run.
    • changes indicates that an input catalog has changed. A second parameter since-version is included to indicate which version of that catalog was processed during the last run.
    • reprocess does not specify whether an input catalog has changed or not. The pipeline is requested to reprocess that whole catalog instead of attempting any kind of incremental processing. This may be due to an explicit user request or to a system condition, such as the first time a pipeline runs.

↑ Top

pipeline-config.conf

CLI only

When you create a Pipeline Version using the CLI, the configuration of the new Pipeline Version is specified by the pipeline-config.conf file, the template file ID, and the pipeline ID. pipeline-config.conf is added to the classpath of the main user process.

Note: Input Catalog Considerations

Because the pipeline template (see discussion below) includes information about the input catalog(s) to be used with this Pipeline Version, the information in the pipeline-config.conf file should agree with the information in the pipeline template, as follows:

  • In the pipeline template, the input catalog ID values are identified.
  • In the pipeline-config.conf file, you use the same input catalog ID values from the template to associate the HRN value for each catalog.
  • The pipeline-job.conf file should only be used when a batch Pipeline Version needs to be executed with specific versions of input catalogs or type of processing. It contains the version information about the catalogs defined in the pipeline-config.conf file. However, if you run the batch Pipeline Version using the scheduler, the scheduler determines the versions and the new data to be processed. Or, if the batch Pipeline Version is run on-demand without any details, the service will, by default, pick the latest catalog versions and reprocess.

Example content of pipeline-config.conf:


    pipeline.config {
         billing-tag = "test-billing-tag"
         output-catalog { hrn = "hrn:here-cn:data:::example-output" }
         input-catalogs {
             test-input-1 { hrn = "hrn:here-cn:data:::example1" }
             test-input-2 { hrn = "hrn:here-cn:data:::example2" }
             test-input-3 { hrn = "hrn:here-cn:data:::example3" }
         }
     }

Where:

  • billing-tag specifies an optional tag to group billing entries for the pipeline.
  • output-catalog specifies the HRN that identifies the output catalog of the pipeline.
  • input-catalogs specifies one or more input catalogs of the pipeline: for each input catalog, its fixed identifier is provided together with the HRN of the actual catalog.

Note: Using the same input and output catalogs

As of HERE platform Release 2.1, a stream Pipeline Version can use the same catalog for input and output. This does not apply to batch Pipeline Versions, which must use a different output catalog.

↑ Top

pipeline template

The pipeline template is an entity required by the Pipeline API for creating a Pipeline Version. From the platform portal, it is created by answering the questions in the UI. It can also be created using the pipeline template create CLI command. The parameters used in both cases include the following.

Required:

parameter Required Description Format
name yes A meaningful name for pipeline template string
cluster type yes Distributed processing framework used to execute the pipeline template string
Possible values:
  • stream-2.0.0
  • stream-3.0.0
  • batch-2.0.0
  • batch-2.1.0
JAR file yes The pipeline JAR file location, including the path on a local file system, to upload. string
class name yes Name of the pipeline template main class string
Group ID yes ID of the group allowed to access the template. string
--input-catalog-ids <catalog IDs...> yes IDs of the input catalogs expected by the pipeline, multiple IDs are separated by a space; this list must match the catalog IDs used in the pipeline-config.conf file.
Note: --input-catalog-ids may also accept a path statement to a configuration file that contains catalogs IDs.
string
--description no Description of the pipeline template string
--supervisor-units no Default size of a supervisor node (1-15 units) integer
--supervisor-units-profile no ID of the resource profile requested for the supervisor units string
--worker-units no Default size of a worker node (1-15 units) integer
--workers no Default number of workers to allocate integer
--worker-units-profile no ID of the resource profile requested for the worker units string
--default-runtime-config no Map of default configuration values for the pipeline application given in the key1=value1\nkey2=value2... form. The Pipeline API passes them as the application.properties file to the pipeline by adding it to the classpath of the main JVM process. key/value pairs
The maximum property name (key) size is 256 and the maximum property value is 1024.
--credentials no The credentials file to use with the command as downloaded from the portal.
--profile no The name of the credentials profile to be used from the olpcli.ini file string
--json no Display the created pipeline template contents in JSON format. string

However, if you are using the API to create a template, the parameters are different. This is because the platform portal and CLI take care of uploading the pipeline JAR file to the Pipeline. When using the API, this is a separate step that creates a package based on the uploaded pipeline JAR file and assigns it a package ID. This is how all pipeline JAR files are identified and tracked.

parameter Required Description Format
name yes User provided name for the Pipeline Template. string [ 3 .. 64 ] characters required
runtimeEnvironment yes The runtime environment type. string
Possible values:
  • "stream-2.0.0"
  • "stream-3.0.0"
  • "stream-4.0"
  • "stream-5.0"
  • "batch-2.0.0"
  • "batch-2.1.0"
  • "batch-3.0"
packageId yes The Pipeline API generated identifier (UUID) for the Package (that is, the pipeline JAR file) that, when combined with this Pipeline Template, is used to create a Pipeline Version.
Note: The package ID is returned by the package upload API function.
string
entryPointClassName yes The fully qualified class name of the entry point of this Pipeline Template. string
groupId yes Group ID that has ownership of the Pipeline Template. Used to restrict access to the Pipeline Version. string
defaultClusterConfiguration yes Configuration of the cluster for running the Pipeline Version; see the example, below. ClusterConfiguration
inputCatalogIds yes List of input catalog identifiers for Pipeline Template. This list of identifiers can be used by a client to describe a list of HRNs needed for a valid Pipeline Version. string
Each identifier is an alpha-numeric string (_ and - are also allowed) providing a label for a catalog HRN.
description no Additional text describing the Pipeline Template. string [ 0 .. 512 ] characters
defaultRuntimeConfiguration yes Default runtime config in Java Properties format (as a String). Any runtime configuration values supplied in this PipelineTemplate will provide default values for Pipeline Versions using this Template. If a Pipeline Version has its own runtime configuration values, they will be added to any defaults available from the parent Pipeline Template. The Pipeline Version runtime configuration values with the same key will over-ride any default configuration values from the Pipeline Template. The application of these default values occurs dynamically whenever the Pipeline Version is run. The Pipeline API passes them as the application.properties file to the pipeline by adding it to the classpath of the main JVM process. string
key/value pair
The maximum property name (key) size is 256 and the maximum property value is 1024.

Example

Results from pipeline template list CLI command:

{
    "pipelineTemplates": [
        {
            "created": "2018-03-01T15:16:53.796Z",
            "groupId": "GROUP-9479863e-a13b-4d35-9eb1-5a054669046e",
            "defaultClusterConfiguration": {
                "workerResourceProfileId": "HS1B",
                "supervisorResourceProfileId": "HS1B",
                "supervisorUnits": 1,
                "workerUnits": 1,
                "workers": 1
            },
            "name": "sparktestcompiler",
            "packageId": "68d723f6-2ae7-40e4-8c24-61512d511852",
            "entryPointClassName": "com.example.Main",
            "description": "",
            "id": "5c0660a3-0fb4-4f35-bcd0-be6ce25075f6",
            "state": "created",
            "defaultRuntimeConfiguration": "",
            "updated": "2018-03-01T15:16:53.796Z",
            "runtimeEnvironment": "batch-3.0"
        }
    ]
}

↑ Top

credentials.properties

Every application (such as a pipeline) functioning within the HERE platform must be registered with the HERE platform. This is done under the Apps and keys function in the platform portal. When registered, the credentials.properties file is created and must be used whenever loading that specific pipeline into the HERE platform. The following is an example of the credentials.properties file.

here.user.id = HERE-01966c94-aaf1-4ae2-a1y6-6516b3f9b6c1
here.client.id = mzLcb1rL8nskvDQpCAAO
here.access.key.id = BELUTk45QdaYGgZ9A_IMTA
here.access.key.secret = 108lI7w9m8G_6sIw9kng-PXGoeHQQ-cv6xByNOuMcRYixZZp...
here.token.endpoint.url = https:elb.cn-northwest-1.account.hereapi.cn/oauth2/token

For more information on how to use this file, see how to Get Your Credentials or the Identity & Access Management Guide.

↑ Top

olpcli.ini

CLI only

This file is created from the credentials.properties file. For additional information, see HERE Workspace for Java & Scala Developers.

↑ Top

application.properties

A custom configuration file (in Java Properties file format) made available on the running pipeline’s classpath. This file is constructed from the value of the Pipeline Template’s defaultRuntimeConfig property (API & CLI only) overridden on a key-by-key basis with the value of the Pipeline Version’s customRuntimeConfig property. The values of defaultRuntimeConfig and customRuntimeConfig are strings whose content represents a valid Java Properties file.

Note

For Stream runtimes, if the uber jar contains application.properties then it will take preference in the classpath over the application.properties provided by the runtime.

Example

    # Value of Pipeline Template's "defaultRuntimeConfig" property
    "myexample.threads = 3\nmyexample.language = \"en_US\"\nmyexample  .processing.window=300\nmyexample.processing.mode=stateless"

    # Value of Pipeline Version’s "customRuntimeConfig" property
    "myexample.threads=5\n\n myexample.processing.mode=    \"stateful\"\nmyexample.processing.filterInvalid = true"

    # The resulting Application.properties file on the pipeline classpath
    # (for the given values of "defaultRuntimeConfig" and "customRuntimeConfig")
    myexample.threads = 5
    myexample.language = "en_US"
    myexample.processing.window = 300
    myexample.processing.mode = "stateful"
    myexample.processing.filterInvalid = true

↑ Top

See Also

results matching ""

    No results matching ""