Migrate Pipeline to New Run-Time Environment

Contents

Migrate Batch-2.0.0 Pipeline to Batch-2.1.0 Environment (Apache Spark 2.4.2 with Support for History Server)

To migrate a Batch pipeline to Batch-2.1.0 environment (Apache Spark 2.4.2 with support for History Server), follow these steps:

  1. Download the HERE platform SDK 2.12 or newer that contains the new sdk-batch-bom.pom environment file. For more information on using the new sdk-batch-bom.pom file, see the SDK Dependency Management.
  2. Re-compile your pipeline code using the new sdk-batch-bom.pom file to create a new JAR file. For more information, see Batch Pipeline.
  3. For an existing Batch pipeline, use the CLI or Web Portal to create a new Pipeline Template and a Pipeline Version with the batch-2.1.0 run-time environment and the new JAR file that was created in Step #2 above. From the Web Portal, you can save time by creating a copy of the existing Pipeline Version that uses a previous version of the batch environment and changing the run-time environment to batch-2.1.0, and using the new JAR file for the new Pipeline Version. For more information, see Pipeline Deployment.
  4. Upgrade to the new Pipeline Version. For more information, see Upgrade via Portal and Upgrade via CLI.

Caution: Compatibility issues due to AWS SDK versions

History Server support in batch-2.1.0 pipelines currently requires the AWS SDK 1.7.4 in the (Spark 2.4.2) runtime environment. This version is not compatible with later versions of the AWS SDK (1.10.x and up). To use the AWS SDK in a pipeline with the batch-2.1.0 run-time environment, it is recommended to build the pipeline JAR with AWS SDK 1.7.4.

For more information about pipelines general support for Apache Spark, see Batch Pipelines - Apache Spark Support FAQ.

↑ Top

Migrate Batch-2.1.0 Pipeline to Batch-3.0 Environment (Apache Spark 2.4.7)

There are no changes required from Pipeline runtime perspective. Please check the HERE Data SDK for Java and Scala documentation for changes with Batch-3.0

↑ Top

To migrate a Stream-2.0.0 pipeline to Stream-3.0.0 environment (Apache Flink 1.10.1), follow these steps:

Caution: Memory changes to consider before migration

Check the Stream-3.0.0 memory changes below before migrating the pipeline.

  1. Use the 2.17 version or newer of HERE Data SDK for Java and Scala that contains the new sdk-stream-bom.pom environment file. For more information on using the new sdk-stream-bom.pom file, see the SDK Dependency Management.
  2. Re-compile your pipeline code using the new sdk-stream-bom.pom file to create a new JAR file. For more information, see Stream Pipeline.
  3. For an existing Stream pipeline, use the CLI or Web Portal to create a new Pipeline Template and a Pipeline Version with the stream-3.0.0 run-time environment and the new JAR file that was created in Step #2 above. From the Web Portal, you can save time by creating a copy of the existing Pipeline Version that uses the stream-2.0.0 run-time environment and changing the run-time environment to stream-3.0.0, and using the new JAR file for the new Pipeline Version. For more information, see Pipeline Deployment.
  4. Upgrade to the new Pipeline Version. For more information, see Upgrade via Portal and Upgrade via CLI.

Apache Flink 1.10 introduced improvements in the memory setup of TaskManagers. Stream-3.0.0 includes these improvements and provides pipeline users with granular control on the memory configuration of the TaskManagers. There are two categories of these changes in Stream-3.0.0 compared to Stream-2.0.0 as documented below:

  • In Stream-2.0.0, the memory in a Worker unit is translated to be set as the taskmanager.heap.size. However, it includes not only the JVM heap but also other "off-heap" memory components. In Stream-3.0.0, the memory in a Worker unit is translated to taskmanager.memory.process.size, which is assigned to the TaskManager JVM process. Flink adjusts the rest of the memory components based on default values or additionally configured options (more on this below).
  • In Stream-3.0.0, pipeline users can use all the memory configuration options via the Stream Configuration option. The memory configurations available to pipeline users include all the configurations listed within the official Flink documentation. It is important to ensure that the values set for these memory components are tuned in accordance with the Worker units configured for the TaskManager.

For more information on the memory improvements, configuration options and defaults, see the official Flink documentation below:

For more information about pipelines general support for Apache Flink, please see Stream Pipelines - Apache Flink Support FAQ.

Known Issues

Users that had a flink-table dependency before, need to update their dependencies to flink-table-planner and the correct dependency of flink-table-api-*, depending on whether Java or Scala is used: one of flink-table-api-java-bridge or flink-table-api-scala-bridge.

stream-2.0.0 contained the flink-table dependency while stream-3.0.0 no longer contains it.

Users that migrated to stream-3.0.0 and have the warning-related scalac options enabled (e.g. -Xfatal-warnings, -Ywarn-unused:*) might face the compilation errors with the following messages: local val in method createSerializer is never used, *.scala:1: Unused import. These messages are actually a result of the macro-expanded code inspection.

To prevent the compilation errors for Scala 2.11, you can choose one of the following options:

  1. Using sbt, add the scalacOptions ~= (_.filterNot(_.startsWith("-Ywarn-unused"))) line to your build.sbt settings for the project.
  2. Provided you use Maven, use the com.github.ghik:silencer-plugin_${scala.version} plugin as following:
<configuration>
  <compilerPlugins>
    <compilerPlugin>
      <groupId>com.github.ghik</groupId>
      <artifactId>silencer-plugin_${scala.version}</artifactId>
      <version>1.7.0</version>
    </compilerPlugin>
  </compilerPlugins>
  <args>
    <!-- Unused val in a Flink macro instantiation -->
    <arg>
      -P:silencer:globalFilters=local val in method createSerializer is never used
    </arg>
    <!-- Unused import in a Flink macro instantiation, that is reported on line '1' of the source file -->
    <arg>
      -P:silencer:lineContentFilters=^/\*$;^package;^//
    </arg>
  </args>
</configuration>

Note

The purpose of the -P:silencer:lineContentFilters=^/\*$;^package;^// argument is to ignore the unused import coming from the Flink macro, which is reported to be located at the first line of the source code. You might want to change this line according to your needs.

Environment Comparison

Group ID Artifact ID stream-4.0.0 stream-3.0.0
com.fasterxml.jackson.core jackson-annotations 2.7.9 Removed
com.fasterxml.jackson.core jackson-core 2.7.9 Removed
com.fasterxml.jackson.core jackson-databind 2.7.9 Removed
com.fasterxml.jackson.dataformat jackson-dataformat-csv 2.7.9 Removed
com.fasterxml.jackson.dataformat jackson-dataformat-yaml 2.7.9 Removed
com.github.scopt scopt_2.11 3.5.0 Removed
com.google.code.findbugs jsr305 1.3.9 Removed
com.google.protobuf protobuf-java 2.6.1 Removed
com.twitter chill-java 0.7.6 Removed
com.twitter chill_2.11 0.7.6 Removed
com.typesafe config 1.3.0 Removed
com.typesafe ssl-config-core_2.11 0.2.1 Removed
com.typesafe.akka akka-actor_2.11 2.4.20 Removed
com.typesafe.akka akka-camel_2.11 2.4.20 Removed
com.typesafe.akka akka-protobuf_2.11 2.4.20 Removed
com.typesafe.akka akka-slf4j_2.11 2.4.20 Removed
com.typesafe.akka akka-stream_2.11 2.4.20 Removed
commons-cli commons-cli 1.3.1 Removed
commons-collections commons-collections 3.2.2 Removed
commons-io commons-io 2.4 Removed
io.netty netty-all 4.1.24.Final Removed
jline jline 2.14.3 Removed
org.apache.camel camel-core 2.17.7 Removed
org.apache.curator curator-client 2.12.0 Removed
org.apache.curator curator-framework 2.12.0 Removed
org.apache.curator curator-recipes 2.12.0 Removed
org.apache.flink flink-python_2.11 1.7.1 Removed
org.apache.flink flink-queryable-state-client-java_2.11 1.7.1 Removed
org.apache.flink flink-s3-fs-hadoop 1.7.1 Removed
org.apache.flink flink-shaded-asm 5.0.4-5.0 Removed
org.apache.flink flink-shaded-asm-6 6.2.1-5.0 Removed
org.apache.flink flink-table_2.11 1.7.1 Removed
org.clapper grizzled-slf4j_2.11 1.3.2 Removed
org.javassist javassist 3.19.0-GA Removed
org.objenesis objenesis 2.1 Removed
org.reactivestreams reactive-streams 1.0.0 Removed
org.rocksdb rocksdbjni 5.7.5 Removed
org.scala-lang scala-compiler 2.11.12 Removed
org.scala-lang scala-library 2.11.12 Removed
org.scala-lang scala-reflect 2.11.12 Removed
org.scala-lang.modules scala-java8-compat_2.11 0.7.0 Removed
org.scala-lang.modules scala-parser-combinators_2.11 1.0.4 Removed
org.scala-lang.modules scala-xml_2.11 1.0.5 Removed
org.tukaani xz 1.5 Removed
org.xerial.snappy snappy-java 1.1.4 Removed
org.yaml snakeyaml 1.15 Removed
ch.qos.logback logback-classic 1.2.3 1.2.3
ch.qos.logback logback-core 1.2.3 1.2.3
com.esotericsoftware.kryo kryo 2.24.0 2.24.0
com.esotericsoftware.minlog minlog 1.2 1.2
com.esotericsoftware.reflectasm reflectasm 1.09 1.09
org.apache.calcite calcite-core 1.17.0 1.21.0
org.apache.calcite calcite-linq4j 1.17.0 1.21.0
org.apache.calcite.avatica avatica-core 1.12.0 1.15.0
org.apache.commons commons-compress 1.4.1 1.18
org.apache.commons commons-lang3 3.3.2 3.3.2
org.apache.commons commons-math3 3.5 3.5
org.apache.flink flink-annotations 1.7.1 1.10.1
org.apache.flink flink-clients_2.11 1.7.1 1.10.1
org.apache.flink flink-container_2.11 1.7.1 1.10.1
org.apache.flink flink-core 1.7.1 1.10.1
org.apache.flink flink-dist_2.11 1.7.1 1.10.1
org.apache.flink flink-hadoop-fs 1.7.1 1.10.1
org.apache.flink flink-java 1.7.1 1.10.1
org.apache.flink flink-mapr-fs 1.7.1 1.10.1
org.apache.flink flink-mesos_2.11 1.7.1 1.10.1
org.apache.flink flink-metrics-core 1.7.1 1.10.1
org.apache.flink flink-metrics-jmx_2.11 1.7.1 1.10.1
org.apache.flink flink-optimizer_2.11 1.7.1 1.10.1
org.apache.flink flink-runtime-web_2.11 1.7.1 1.10.1
org.apache.flink flink-runtime_2.11 1.7.1 1.10.1
org.apache.flink flink-scala-shell_2.11 1.7.1 1.10.1
org.apache.flink flink-scala_2.11 1.7.1 1.10.1
org.apache.flink flink-shaded-curator 1.7.1 1.10.1
org.apache.flink flink-shaded-guava 18.0-5.0 18.0-9.0
org.apache.flink flink-shaded-jackson 2.7.9-5.0 2.10.1-9.0
org.apache.flink flink-shaded-netty 4.1.24.Final-5.0 4.1.39.Final-9.0
org.apache.flink flink-statebackend-rocksdb_2.11 1.7.1 1.10.1
org.apache.flink flink-streaming-java_2.11 1.7.1 1.10.1
org.apache.flink flink-streaming-scala_2.11 1.7.1 1.10.1
org.apache.flink flink-table-common 1.7.1 1.10.1
org.apache.flink flink-yarn_2.11 1.7.1 1.10.1
org.apache.flink force-shading 1.7.1 1.10.1
org.apache.mesos mesos 1.0.1 1.0.1
org.codehaus.janino commons-compiler 3.0.7 3.0.9
org.codehaus.janino janino 3.0.7 3.0.9
org.slf4j log4j-over-slf4j 1.7.25 1.7.25
org.slf4j slf4j-api 1.7.15 1.7.15
org.apache.flink flink-cep_2.11 - 1.10.1
org.apache.flink flink-queryable-state-client-java - 1.10.1
org.apache.flink flink-shaded-asm-7 - 7.1-9.0
org.apache.flink flink-sql-parser - 1.10.1
org.apache.flink flink-table-api-java - 1.10.1
org.apache.flink flink-table-api-java-bridge_2.11 - 1.10.1
org.apache.flink flink-table-api-scala-bridge_2.11 - 1.10.1
org.apache.flink flink-table-api-scala_2.11 - 1.10.1
org.apache.flink flink-table-planner-blink_2.11 - 1.10.1
org.apache.flink flink-table-planner_2.11 - 1.10.1
org.apache.flink flink-table-runtime-blink_2.11 - 1.10.1
org.apache.flink flink-table-uber-blink_2.11 - 1.10.1
org.apache.flink flink-table-uber_2.11 - 1.10.1

↑ Top

There are no changes required from Pipeline runtime perspective. Please check the HERE Data SDK for Java and Scala documentation for changes with Stream-4.0

↑ Top

To migrate a Stream-4.0 pipeline to Stream-5.0 environment (Apache Flink 1.13.5), follow these steps:

Caution: Memory changes to consider before migration

Check the Stream-5.0 memory changes below before migrating the pipeline.

  1. Use the 2.34 version or newer of HERE Data SDK for Java and Scala that contains the new sdk-stream-bom.pom environment file. For more information on using the new sdk-stream-bom.pom file, see the SDK Dependency Management.
  2. Re-compile your pipeline code using the new sdk-stream-bom.pom file to create a new JAR file. For more information, see Stream Pipeline.
  3. For an existing Stream pipeline, use the CLI or Web Portal to create a new Pipeline Template and a Pipeline Version with the stream-5.0 run-time environment and the new JAR file that was created in Step #2 above. From the Web Portal, you can save time by creating a copy of the existing Pipeline Version that uses the stream-4.0 run-time environment and changing the run-time environment to stream-5.0, and using the new JAR file for the new Pipeline Version. For more information, see Pipeline Deployment.
  4. Upgrade to the new Pipeline Version. For more information, see Upgrade via Portal and Upgrade via CLI.

Apache Flink 1.11 introduced improvements in the memory setup of the JobManagers. By moving from Flink 1.10 to 1.13, Stream-5.0 now includes these improvements and provides pipeline users with granular control on the memory configuration of the JobManager. There are two categories of these changes in Stream-5.0 compared to Stream-4.0 as documented below:

  • In Stream-4.0, the memory in the Supervisor unit is translated to be set as the jobmanager.heap.size. However, it includes not only the JVM heap but also other "off-heap" memory components. In Stream-5.0, the memory in a Supervisor unit is translated to jobmanager.memory.process.size, which is assigned to the JobManager JVM process. Flink adjusts the rest of the memory components based on default values or additionally configured options (more on this below).
  • In Stream-5.0, pipeline users can use all the memory configuration options via the Stream Configuration option. The memory configurations available to pipeline users include all the configurations listed within the official Flink documentation. It is important to ensure that the values set for these memory components are tuned in accordance with the Supervisor units configured for the JobManager.

For more information on the memory improvements, configuration options and defaults, see the official Flink documentation below:

For more information about pipelines general support for Apache Flink, please see Stream Pipelines - Apache Flink Support FAQ.

Known Issues

None

Environment Comparison

Group ID Artifact ID stream-4.0 stream-5.0
org.apache.calcite.avatica avatica-core 1.15.0 1.17.0
org.apache.calcite calcite-core 1.21.0 1.26.0
org.apache.calcite calcite-linq4j 1.21.0 1.26.0
org.apache.commons commons-compress 1.18 1.21
org.apache.flink flink-annotations 1.10.3 1.13.5
org.apache.flink flink-cep_2.12 1.10.3 1.13.5
org.apache.flink flink-clients_2.12 1.10.3 1.13.5
org.apache.flink flink-connector-base Unavailable 1.13.5
org.apache.flink flink-connector-files Unavailable 1.13.5
org.apache.flink flink-container_2.12 1.10.3 1.13.5
org.apache.flink flink-core 1.10.3 1.13.5
org.apache.flink flink-csv Unavailable 1.13.5
org.apache.flink flink-file-sink-common Unavailable 1.13.5
org.apache.flink flink-hadoop-fs 1.10.3 1.13.5
org.apache.flink flink-java 1.10.3 1.13.5
org.apache.flink flink-json Unavailable 1.13.5
org.apache.flink flink-mapr-fs 1.10.3 1.13.5
org.apache.flink flink-metrics-core 1.10.3 1.13.5
org.apache.flink flink-metrics-jmx_2.12 1.10.3 Removed
org.apache.flink flink-optimizer_2.12 1.10.3 1.13.5
org.apache.flink flink-queryable-state-client-java 1.10.3 1.13.5
org.apache.flink flink-runtime_2.12 1.10.3 1.13.5
org.apache.flink flink-runtime-web_2.12 1.10.3 1.13.5
org.apache.flink flink-scala_2.12 1.10.3 1.13.5
org.apache.flink flink-shaded-asm-7 7.1-9.0 7.1-13.0
org.apache.flink flink-shaded-curator 1.10.3 Removed
org.apache.flink flink-shaded-guava 18.0-9.0 18.0-13.0
org.apache.flink flink-shaded-jackson 2.10.1-9.0 2.12.1-13.0
org.apache.flink flink-shaded-netty 4.1.39.Final-9.0 4.1.49.Final-13.0
org.apache.flink flink-shaded-zookeeper-3 Unavailable 3.4.14-13.0
org.apache.flink flink-sql-parser 1.10.3 1.13.5
org.apache.flink flink-sql-parser-hive Unavailable 1.13.5
org.apache.flink flink-statebackend-rocksdb_2.12 1.10.3 1.13.5
org.apache.flink flink-streaming-java_2.12 1.10.3 1.13.5
org.apache.flink flink-streaming-scala_2.12 1.10.3 1.13.5
org.apache.flink flink-table-api-java 1.10.3 1.13.5
org.apache.flink flink-table-api-java-bridge_2.12 1.10.3 1.13.5
org.apache.flink flink-table-api-scala_2.12 1.10.3 1.13.5
org.apache.flink flink-table-api-scala-bridge_2.12 1.10.3 1.13.5
org.apache.flink flink-table-common 1.10.3 1.13.5
org.apache.flink flink-table-planner_2.12 1.10.3 1.13.5
org.apache.flink flink-table-planner-blink_2.12 1.10.3 1.13.5
org.apache.flink flink-table-runtime-blink_2.12 1.10.3 1.13.5
org.apache.flink flink-table-uber_2.12 1.10.3 Removed
org.apache.flink flink-table-uber-blink_2.12 1.10.3 Removed
org.apache.flink flink-yarn_2.12 1.10.3 1.13.5
org.apache.flink force-shading 1.10.3 1.13.5
org.codehaus.janino commons-compiler 3.0.9 3.0.11
org.codehaus.janino janino 3.0.9 3.0.11

↑ Top

See Also

results matching ""

    No results matching ""