Create Batch Pipelines
For more information on your options for developing batch pipelines, see SDK Workflows.
Note: Maven
Batch archetypes are supported by the Maven build system only.
The HERE Data SDK offers the following archetypes.
-
batch-direct1ton-java-archetype
and batch-direct1ton-scala-archetype
for Direct1toN compilation for Java and Scala -
batch-directmton-java-archetype
and batch-directmton-scala-archetype
for DirectMtoN compilation for Java and Scala -
batch-reftree-java-archetype
and batch-reftree-scala-archetype
for RefTree compilation for Java and Scala -
batch-mapgroup-java-archetype
and batch-mapgroup-scala-archetype
for MapGroup compilation for Java and Scala
For more information on batch pipeline design patterns, see the Data Processing Library.
Generate New Batch Pipeline Projects
You can use Maven Archetypes either in the command line or in an IDE such as Eclipse or IntelliJ IDEA. The examples below are for the command line option.
Follow the steps below to generate a new batch pipeline project.
- Create a wrapper project.
- Add a schema definition module.
- Add one or more batch pipelines.
- Build your project.
After you build your project, you can start development.
Create a Wrapper Project
To create a wrapper for your project from the public Maven pom-root
archetype, run mvn archetype:generate
with the following parameters.
mvn archetype:generate -DarchetypeGroupId=org.codehaus.mojo.archetypes \
-DarchetypeArtifactId=pom-root \
-DarchetypeVersion=1.1 \
-DgroupId=com.example \
-DartifactId=myproject \
-Dversion=1.0-SNAPSHOT \
-Dpackage=com.example.myproject
mvn archetype:generate -DarchetypeGroupId=org.codehaus.mojo.archetypes ^
-DarchetypeArtifactId=pom-root ^
-DarchetypeVersion=1.1 ^
-DgroupId=com.example ^
-DartifactId=myproject ^
-Dversion=1.0-SNAPSHOT ^
-Dpackage=com.example.myproject
The above example uses the following values.
- archetypeGroupId
org.codehaus.mojo.archetypes
- do not change - archetypeArtifactId
pom-root
- do not change - archetypeVersion
1.1
- do not change - groupId
com.example
- artifactId
myproject
- version
1.0-SNAPSHOT
- package
com.example.myproject
Set your own values for the relevant parameters for your project.
Add a Data Schema Definition Module
For input data, use a schema defined in the layer configuration for your input catalog. Batch pipelines require one or more input layers and a single output layer. They are defined by data schemas. You can either use a schema from the platform or create your own. If you use an existing schema for your output layer, you can include the schema artifact in your pipeline project.
When a new layer is created, you can specify a schema. This schema defines the structure for the data in the layer. You can select an existing schema or define your own.
The Data SDK provides a Maven Archetype for creating new schemas and extending existing data schemas, see Create and Extend Schemas.
To include an existing schema artifact into your pipeline project, follow the steps below.
- Go to the HERE platform portal.
- Open the Data tab and search for the target layer.
- Go to the Schema tab for the layer.
- Copy the Maven dependencies and paste them into your pipeline project POM file.
For more information, see the Artifact Service section in Dependency Management.
Create Your Own Schema
The Data SDK provides a Maven Archetype for creating new schemas and extending existing data schemas. For more information, see Create and Extend Schemas.
To add a data schema definition model to your project, enter the following command in your project folder.
mvn archetype:generate \
-DarchetypeGroupId=com.here.platform.schema \
-DarchetypeArtifactId=project_archetype \
-DarchetypeVersion=2.0.0 \
-DgroupId=com.example.myproject \
-DartifactId=model1 \
-Dversion=1.0.0 \
-Dpackage=com.example.myproject.model1 \
-DmajorVersion=0
mvn archetype:generate -DarchetypeGroupId=com.here.platform.schema ^
-DarchetypeArtifactId=project_archetype ^
-DarchetypeVersion=2.0.0 ^
-DgroupId=com.example.myproject ^
-DartifactId=model1 ^
-Dversion=1.0.0 ^
-Dpackage=com.example.myproject.model1 ^
-DmajorVersion=0
The above example uses the following values.
- archetypeGroupId
com.here.platform.schema
- do not change - archetypeArtifactId
project_archetype
- do not change - archetypeVersion
2.0.0
- do not change - groupId
com.example.myproject
- artifactId
model1
- version
1.0.0
- package
com.example.myproject.model1
- majorVersion
0
The last parameter defines the major version of your data schema and is in the package name. This allows you to use multiple major schema versions simultaneously.
Set your own values for these parameters for your project.
You may create one or more data schema definition models for your project.
To use the schema with your batch pipeline, publish the schema project and add the resulting schema artifacts as a dependency. For Java bindings, use the artifact from the project with _java
in the title. For Scala bindings, use the artifact from the project with _scala
in the title.
To publish schema artifacts to the local repository, enter the following command in your schema project folder.
mvn install
For more information about the schema publishing, see Create and Extend Schemas.
Add a Java Batch Pipeline
If your project uses Java, add a Java batch pipeline to your project and instantiate the Java batch pipeline archetype.
To add a Java batch pipeline to your project, enter the following command in your project folder.
mvn archetype:generate -DarchetypeGroupId=com.here.platform \
-DarchetypeArtifactId=batch-directmton-java-archetype \
-DarchetypeVersion=1.0.655 \
-DgroupId=com.example.myproject \
-DartifactId=batch1 \
-Dversion=1.0-SNAPSHOT \
-Dpackage=com.example.myproject.batch1
mvn archetype:generate -DarchetypeGroupId=com.here.platform ^
-DarchetypeArtifactId=batch-directmton-java-archetype ^
-DarchetypeVersion=1.0.655 ^
-DgroupId=com.example.myproject ^
-DartifactId=batch1 ^
-Dversion=1.0-SNAPSHOT ^
-Dpackage=com.example.myproject.batch1
The above example uses the following values.
- archetypeGroupId
com.here.platform
- do not change - archetypeArtifactId
batch-directmton-java-archetype
- archetypeVersion
1.0.655
- do not change - groupId
com.example.myproject
- artifactId
batch1
- version
1.0-SNAPSHOT
- package
com.example.myproject.batch1
The DarchetypeArtifactId
parameter defines the compiler mode of the batch pipeline. In the example, the selected compiler is the default DirectMtoN
compiler module.
To switch to a different pipeline archetype, specify one of the following options.
-
batch-direct1ton-java-archetype
for a Direct1toN compiler -
batch-directmton-java-archetype
for a DirectMtoN compiler -
batch-reftree-java-archetype
for a RefTree compiler -
batch-mapgroup-java-archetype
for a MapGroup compiler
For more information on batch pipeline design patterns, see the Data Processing Library.
To use your schema in a pipeline project, you have to include the dependency in the project POM file. For Java bindings, use the artifact from the schema project with _java
in the title.
You may create one or more batch pipelines in your project.
Add a Scala Batch Pipeline
If your project uses Scala, add a Scala batch pipeline to your project and instantiate the Scala batch pipeline archetype.
To add a Scala batch pipeline to your project, enter the following command in your project folder.
mvn archetype:generate -DarchetypeGroupId=com.here.platform \
-DarchetypeArtifactId=batch-directmton-scala-archetype \
-DarchetypeVersion=1.0.655 \
-DgroupId=com.example.myproject \
-DartifactId=batch2 \
-Dversion=1.0-SNAPSHOT \
-Dpackage=com.example.myproject.batch2
mvn archetype:generate -DarchetypeGroupId=com.here.platform ^
-DarchetypeArtifactId=batch-directmton-scala-archetype ^
-DarchetypeVersion=1.0.655 ^
-DgroupId=com.example.myproject ^
-DartifactId=batch2 ^
-Dversion=1.0-SNAPSHOT ^
-Dpackage=com.example.myproject.batch2
The above example uses the following values.
- archetypeGroupId
com.here.platform
- do not change - archetypeArtifactId
batch-directmton-scala-archetype
- archetypeVersion
1.0.655
- do not change - groupId
com.example.myproject
- artifactId
batch2
- version
1.0-SNAPSHOT
- package
com.example.myproject.batch2
The DarchetypeArtifactId
parameter defines the compiler mode of the batch pipeline. In the example, the selected compiler is the default batch-directmton-scala-archetype
compiler module.
To switch to a different pipeline archetype, specify one of the following options.
-
batch-direct1ton-scala-archetype
for a Direct1toN compiler -
batch-directmton-scala-archetype
for a DirectMtoN compiler -
batch-reftree-scala-archetype
for a RefTree compiler -
batch-mapgroup-scala-archetype
for a MapGroup compiler
For more information on batch pipeline design patterns, see the Data Processing Library.
To use your schema in a pipeline project, you have to include the dependency in the project POM file. For Scala bindings, use the artifact from the schema project with _scala
in the title.
You may create one or more batch pipelines in your project.
Build Your Project to Run Locally
To build your project, enter the following command in your project folder.
mvn install
To run your pipeline on the platform, you need to build a fat jar
first. This can be done by running the following command.
mvn install -Pplatform
For more information on building a fat jar
, see the Data Processing Library.