In the article Deploying a Pipeline, the process of deploying a pipeline and creating a Pipeline Version is described. The options available to run this new Pipeline Version depend on the type of pipeline you create (that is, Batch or Stream). Once you have created a new Pipeline Version, the typical next step is to activate it to start processing the data from the input catalogs. There are several other actions that can be performed on a Pipeline Version. These include:
Once you have created the Pipeline Version, it is displayed on a Portal screen similar to Figure 1. If there were multiple versions of this pipeline available, they would be displayed in additional rows below Version 1.
Figure 1. Pipeline Activation
From the information on this sample screen (Figure 1), you can see that this is a batch pipeline with a brief description of its target output catalog and its execution mode (On-demand). Its unique Pipeline ID (UUID) and Group assignment (if Group is selected when Deploying a Pipeline) is also displayed so that you can confirm that the correct pipeline is being used.
Hint
Click on the button at the end of the Pipeline ID to copy the ID to your clipboard.
Stream Pipeline Activation Options
Click Activate for a Stream Pipeline Version in Figure 1. The Activate Options dialog box opens (see Figure 2).
You can specify the run-time credentials for running this pipeline version. These may be your user credentials or user-generated credentials (apps). For more information, see the Identity & Access Management Guide.
The dialog also contains a switch to run the Stream (Flink) pipeline version's Job Manager in the High Availability mode. When enabled, another Job Manager is deployed as a standby for the pipeline. These multiple Job Managers are managed via Zookeeper which coordinates leader election and the pipeline's state. This second Job Manager is deployed in a different Availability Zone than the first one. If the primary Job Manager crashes, the standby Job Manager quickly takes over and the pipeline continues to run. Also, the failed primary Job Manager is restarted and it becomes the new standby Job Manager to re-establish high availability to protect against future failures.
The option to enable High Availability is available during Activate, Resume, and Upgrade operations.
Caution: Additional Cost
Flink Job Manager High Availability option increases the cost of running a Flink pipeline. See below.
There are additional resources required to run a Flink pipeline's Job Manager with high availability:
Resources for the second Job Manager (same size as the first one)
Resources for the Zookeeper: 1.5 CPU and 1.5 GB of RAM
The cost for these extra resources is added to the pipeline's original cost.
Click Activate for a Batch Pipeline Version in Figure 1. The Activate Options dialog box opens (see Figure 3). This allows two options for activating the Batch Pipeline Version: Run now (on-demand) or Schedule. The default is Run now (on-demand).
You can specify the run-time credentials for running this pipeline version. These may be your user credentials or user-generated credentials (apps). For more information, see the Identity & Access Management Guide.
Select the Run now radio button to display the dialog asking for additional information about input catalogs to process. Options include (a) reprocess the latest catalog version, or (b) reprocess a specific catalog version. If you select the option Reprocess latest catalog version, the system will identify and reprocess the latest catalog versions (see Figure 3). The (a) option is the default. Click Activate to accept the default.
Figure 3. Run Now option and reprocess latest catalog version
If you select the Reprocess specific catalog version option, the dialog again changes to request more information. Figure 4 shows the dialog asking for the specific version of the input catalog to be processed. Enter the Catalog version number and click Activate to start processing.
Figure 4. Run Now option and reprocess specified catalog version
If you select the Schedule option radio button it will display two options: Data change or Time schedule. Select the Data change option and click Activate to put the new Pipeline Version waiting in the Scheduled state until the input catalogs are updated with new data.
Figure 5. Data change option
Select the Time schedule radio button to expand the dialog to ask for a CRON schedule. The CRON schedule should be a valid unix CRON expression. The interval between consecutive attempts of the Pipeline Version can not be less than an hour. The provided CRON expression is evaluated in UTC timezone. As an example, cron expression of "30 " will result in attempts to execute the Pipeline Version at 30 minutes past the hour, every hour of the UTC clock. The attempt to execute the Pipeline Version is skipped if the Pipeline Version is still running at the time of the next attempt. By default, a job will only be run if there are pending changes to be processed in the input catalogs. The CLI and Pipeline API provide more options for scheduling batch pipeline versions by Time.
Figure 6. Time schedule option
Info: Run Latency
There are always a few moments of latency before the pipeline actually begins processing (during which the pipeline waits in the scheduled state). This is even true with the Run Now (on-demand) option. Scheduled operations can be even more delayed because they are triggered by the availability of data and system resources to start processing.
The CLI and Pipeline API allow more types of processing to be used for On-demand Batch pipelines.
Managing an Activated Pipeline Version
Figure 7. Running a Pipeline Version
When the state has changed to Running, the operational choices on the screen also change. The option to Pause the running Pipeline Version or to Cancel it is now available.
Info: Metrics Button
This button takes you to the companion application (for example, Splunk or Grafana) that monitors metrics from the running Pipeline Version. For additional information about Metrics, see the Logs, Monitoring and Alerts User Guide.
In the case of a Scheduled Pipeline Version, the listing will look something like this:
Figure 8. Scheduled Pipeline Version
In the case of a Run now Pipeline Version, the listing will look something like this:
Figure 9. Run Now Pipeline
Hint
The More menu in the top right-hand corner of the page provides additional functions.
Figure 10. Annotated More menu
Pipeline Details
Click on any listed Pipeline Version name to access details about that pipeline. Below is an example of the details of a sample pipeline.
Figure 11. Pipeline Version Details Screen
Under the Details tab, you can see all of the run time information available about the Pipeline Version. This is also the screen where you can access and edit the Logging Level of this Pipeline Version. To change the Logging Level, click on the Edit button next to "Logging configuration" and enter the new or additional logging level in the dialog. Saving your changes stores them for that specific Pipeline Version, but they will not go into effect until a new job is started with that Pipeline Version.
Operations
The Operations tab provides a summary of operations on the pipeline version. Available information is illustrated below.
Figure 12. Pipeline Version Operations Screen
Running Jobs
Click on the Jobs tab to examine any current and old job for this Pipeline Version.
Figure 13. Pipeline Version Jobs Screen
Pausing a Running Pipeline Version
Pausing a running Pipeline Version requires special considerations because the results of the "pause" depends on the kind of pipeline job being performed.
If a Batch pipeline version using a Schedule mode to run is subsequently paused, it will not stop processing immediately. Instead, the current job will run to completion and after that the Batch Pipeline Version will change to a Paused state. A Batch pipeline version running on-demand cannot be paused, it can only be canceled.
However, if you pause a running Stream pipeline job, the current state of the job is saved and the job is gracefully stopped at that point. When the resume command is issued, a new job is started to restart the Pipeline Version from the previously saved state. There is a time limit on how long the job can be paused: 1 hour is the default setting for data retention. This information is displayed as a reminder when you pause the pipeline job as shown below. If the paused job is cancelled, the saved state of the paused Job is discarded and the Pipeline Version moves to the ready state.
Figure 14. Pause a Streaming Pipeline VersionFigure 15. Paused Pipeline with Alert
Resume a Running Pipeline
The resume operation is used to resume the data processing after the pipeline has been paused. When a paused Pipeline Version is resumed, the typical delay is 30-90 seconds. But this delay can last for several minutes if resources are limited. And, in some cases, the resume operation will have no effect, such as when:
the Pipeline Version is not in a paused state because it has been paused beyond the 1-hour time limit on a paused state, after which resume is unavailable
the Pipeline Version is not in a paused state for some other reason
When a Stream Pipeline Version is resumed, a new job begins again from the point where the pause occurred. However, Batch Pipeline Versions cannot be resumed the same way. This is because the batch job is allowed to run to completion before the "pause" goes into effect. No further jobs will be processed until the Pipeline Version is resumed and a new job is available.
Hint
When on this listing page, you can begin the process of creating a new Pipeline Version by clicking on this button:
Cancel a Running Pipeline Version
Cancelling a running Pipeline Version normally requires submitting the Pipeline ID and Pipeline Version ID along with the Cancel Request. The service cancels the running job without saving the state and returns the new pipeline status. The Portal simplifies this process as shown here.
Click on the Cancel button for the running pipeline job.
Figure 16. Cancel a Running Job
You will be asked for confirmation.
Figure 17. Verify Cancellation
The following screen shows the job as Cancelled and the Pipeline Version returned to the Ready state. Because the Pipeline Version has been Cancelled and not Deleted, it can be reactivated for a new job. In the case of a Batch job, the job will run to completion before canceling any further scheduled jobs. In the case of a Stream job, the job would be immediately interrupted and canceled with no chance of restarting the interrupted job. The Stream Pipeline Version is returned to its Ready state.
Figure 18. Pipeline Job Cancelled
Delete a Pipeline Version
You can delete a Pipeline Version from the pipeline service without having any effect on other Pipeline Versions using the same pipeline JAR file. To delete a Pipeline Version, you must select the Pipeline Version to be deleted from the list page. Then, click the ellipsis menu as shown in the illustration below. Click the "Delete pipeline" selection to initiate the deletion process.
Figure 19. Delete Pipeline Button
The UI will show an alert dialog requesting confirmation of the fact that deleting a Pipeline Version is a destructive action that cannot be undone. Click Delete Version to complete deleting the Pipeline Version, or Cancel to stop the deletion action.
Finally, the pipeline list page refreshes without listing the deleted Pipeline Version. However, there is an alert message at the top of the page to confirm the deletion action.
Figure 20. Pipeline List Page after Deletion
Deactivate a Pipeline Version
If a Pipeline Version is still in a scheduled state after activation, deactivate it by clicking on the Deactivate button as shown below. After deactivation, the Pipeline Version returns to a Ready state where it is again available for activation.
Figure 21. Deactivate a Selected Pipeline Version
Once the pipeline has been deactivated, it will be listed in a ready state, waiting for activation.