HERE platform pipelines are stateful. This means that Deployed Pipelines always have a characteristic state that defines them. Pipelines can only have one state at any given time.
The only deployable component of a pipeline is a Pipeline Version. Its run-time configuration is defined by the Pipeline Template. Some run-time parameters can be over-ridden by optional parameters.
There are 4 possible states for a Pipeline Version:
- READY – when a Pipeline Version is just created by the user or is ready to be initiated
- SCHEDULED – when a Pipeline Version is scheduled and waiting for a trigger to start the processing of input data
- RUNNING – when a Pipeline Version has an active Job processing the input data
- PAUSED – when a Pipeline Version is paused by the user and is not actively processing the input data
The following operations can be performed on a Pipeline Version, either by the user or by the platform.
Most of these operations are asynchronous and the platform will run them in the background. You will be able to view or list these operations, their state, and related message or error information.
These operations may have any of the following states:
- ACCEPTED – The operation is accepted by the platform but yet to process it.
- BEING_PROCESSED – The platform is processing the operation request.
- SUCCEEDED – The operation is successfully completed.
FAILED – The operation failed to complete.
When an operation on a pipeline version is being processed, a new operation may not be requested on the same pipeline version until the current operation succeeds or fails.
A Job corresponds to an execution instance of a Pipeline Version. A Pipeline Version may have 0 or more Jobs, but only one can be Running at a time.
For Batch processing, a Job terminates by either succeeding or failing.
For Stream processing, a Job is not supposed to terminate by itself, but if this happens the platform may automatically restart the processing with a new Job.
There are 4 possible states for a Job:
- STARTING – the job is starting and resources are being allocated.
- RUNNING – the job is running.
- COMPLETED – the job terminated successfully.
- FAILED – the job terminated due to a failure.
- CANCELED – the job is terminated before completion.
Valid transitions between states are independent of the processing model of a Pipeline Version (that is, Batch or Stream). However, some transitions may have a different internal behavior based on the processing model being used. The figure below is a representation of the various Pipeline Version States and Operations.
A new Pipeline Version with a UUID is created and is ready to be used.
A Pipeline Version is activated and gets scheduled to be run.
When a Batch pipeline version is activated to be run on-demand, it enters the Scheduled state and immediately changes to the Running state to attempt to run once and then returns to the Ready state. No further processing is done even if the input catalogs have new data. It will have to be run again manually.
|3||Run (internal)||Scheduled||Running|| |
This is an internal operation. A new Job is created for the Pipeline Version and the state changes to Running.
For Batch processing, the platform waits for the input catalogs to change to trigger a new job.
For Stream processing, the platform starts running the job after a few minutes of delay and continues running.
|4||Terminate (internal)||Running||Scheduled or Ready|| |
This is an internal operation. The current Job terminates with a success or failure. If the Pipeline Version is configured to run again, it will be set to Scheduled state, otherwise it will be set to Ready state.
|5||Deactivate||Scheduled||Ready||A Pipeline Version is deactivated and returns to the Ready state. |
Note: The platform will automatically deactivate a Pipeline Version if it fails 12 times consecutively.
For Batch processing, the current Job is completed and future Jobs are paused.
For Stream processing, the current state is saved and the Job is gracefully terminated.
For Batch processing, a Pipeline Version gets scheduled for it to create a new Job upon the next trigger of input catalog change.
For Stream processing, a new Job is started to resume the Pipeline Version from the previously saved state. This saved state is then discarded.
|8||Cancel||Running||Ready||The running Job is immediately terminated without saving state and the Pipeline Version moves to Ready state. For Batch processing, all the future Jobs are also canceled.|
For Batch processing, all the future jobs will be canceled and the Pipeline Version moves to Ready state.
For Stream processing, the saved state of the paused Job is discarded and the Pipeline Version moves to Ready state.
|10||Restart (Stream Pipeline only)||Running||Running|| |
In situations where a Stream pipeline needs to be interrupted and the users can't perform the requested action, an attempt is made to save the state of the Stream pipeline and restart it from the saved state.