HERE platform pipelines are designed to accommodate specific usage patterns. The available patterns are illustrated below, starting with the simplest pattern and progressing to more complex use cases. Additional information is provided throughout the Developer Guide.
This is the general pattern for using a pipeline.
Note: Data sources and sinks
A specific catalog layer can serve as a source or a sink, but never both at once. The type of catalog layer that may be used depends on the type of pipeline it is used with. For example, a streamed layer cannot be used with a batch pipeline.
You can have multiple inputs, but only one output from a pipeline.
You can use the pipeline to process continuous data streams using Apache Flink. [stream-to-stream]
- The data catalog is defined in the configuration files.
- The layer used is defined in the code.
You may use the same data catalog for a stream pipeline's input and output as long as separate layers are being used for the data source and data sink.
This is a typical batch processing pattern using Apache Spark. [versioned-to-versioned]
This is a typical pattern using volatile layers.
These are typical patterns using index layers.
Note: Index Layer Limits of Use
A more advanced pattern uses a catalog's volatile layer as reference data.
In this case, the output catalog uses a stream layer.
The stream layer here typically uses a windowing function.
But in this case, the output catalog is only interested in a "data snapshot," so the volatile layer is used.
Alternatively, you can use the output catalog's versioned layer, perhaps for aggregating data over a window of time. This approach could also be useful for archiving data, with or without processing enhancement. Also, it could be useful for historical analysis in a Notebook.
Or, you can use the output catalog's Index layer, perhaps for organizing historical data by event time.
Another pattern combines input data from a versioned data set with data from an index layer.