Running Pipelines

Topics

Overview

In the article Deploying a Pipeline, the process of deploying a pipeline and creating a Pipeline Version is described. The options available to run this new Pipeline Version depend on the type of pipeline you create (that is, Batch or Stream). Once you have created a new Pipeline Version, the typical next step is to activate it to start processing the data from the input catalogs. There are several other actions that can be performed on a Pipeline Version. These include:

Note

For a full list of available commands, see the Command Line Interface Developer Guide or the API Reference. Some actions may not be available through the platform portal GUI.

Activate a Pipeline Version

Once you have created the Pipeline Version, it is displayed on a Portal screen similar to Figure 1. If there were multiple versions of this pipeline available, they would be displayed in additional rows below Version 1.

A screen capture of the screen where the new Pipeline Version can be activated
Figure 1. Pipeline Activation

From the information on this sample screen (Figure 1), you can see that this is a batch pipeline with a brief description of its target output catalog and its execution mode (On-demand). Its unique Pipeline ID (UUID) and Group assignment (if Group is selected when Deploying a Pipeline) is also displayed so that you can confirm that the correct pipeline is being used.

Hint

Click on the button at the end of the Pipeline ID to copy the ID to your clipboard.

Stream Pipeline Activation Options

Click Activate for a Stream Pipeline Version in Figure 1. The Activate Options dialog box opens (see Figure 2).

You can specify the run-time credentials for running this pipeline version. These may be your user credentials or user-generated credentials (apps). For more information, see the Identity & Access Management Guide.

The dialog also contains a switch to run the Stream (Flink) pipeline version's Job Manager in the High Availability mode. When enabled, another Job Manager is deployed as a standby for the pipeline. These multiple Job Managers are managed via Zookeeper which coordinates leader election and the pipeline's state. This second Job Manager is deployed in a different Availability Zone than the first one. If the primary Job Manager crashes, the standby Job Manager quickly takes over and the pipeline continues to run. Also, the failed primary Job Manager is restarted and it becomes the new standby Job Manager to re-establish high availability to protect against future failures.

The option to enable High Availability is available during Activate, Resume, and Upgrade operations.

Caution: Additional Cost

Flink Job Manager High Availability option increases the cost of running a Flink pipeline. See below.

There are additional resources required to run a Flink pipeline's Job Manager with high availability:

  • Resources for the second Job Manager (same size as the first one)
  • Resources for the Zookeeper: 1.5 CPU and 1.5 GB of RAM

The cost for these extra resources is added to the pipeline's original cost.

Screen capture of Pipeline Flink HA Activation Option
Figure 2. Pipeline Flink HA Activation Option

For more information on Flink Job Manager High Availability, see Stream Processing Best Practices.

Batch Pipeline Activation Options

Click Activate for a Batch Pipeline Version in Figure 1. The Activate Options dialog box opens (see Figure 3). This allows two options for activating the Batch Pipeline Version: Run now (on-demand) or Schedule. The default is Run now (on-demand).

You can specify the run-time credentials for running this pipeline version. These may be your user credentials or user-generated credentials (apps). For more information, see the Identity & Access Management Guide.

Select the Run now radio button to display the dialog asking for additional information about input catalogs to process. Options include (a) reprocess the latest catalog version, or (b) reprocess a specific catalog version. If you select the option Reprocess latest catalog version, the system will identify and reprocess the latest catalog versions (see Figure 3). The (a) option is the default. Click Activate to accept the default.

Screen capture of Activate Option dialog with Run Now option and reprocess latest input catalog selected
Figure 3. Run Now option and reprocess latest catalog version

If you select the Reprocess specific catalog version option, the dialog again changes to request more information. Figure 4 shows the dialog asking for the specific version of the input catalog to be processed. Enter the Catalog version number and click Activate to start processing.

Screen capture of Activate Option dialog with Run Now option and process the catalog version specified
Figure 4. Run Now option and reprocess specified catalog version

If you select the Schedule option radio button it will display two options: Data change or Time schedule. Select the Data change option and click Activate to put the new Pipeline Version waiting in the Scheduled state until the input catalogs are updated with new data.

Screen capture of Activate Option dialog with schedule and Data change option selected
Figure 5. Data change option

Select the Time schedule radio button to expand the dialog to ask for a CRON schedule. The CRON schedule should be a valid unix CRON expression. The interval between consecutive attempts of the Pipeline Version can not be less than an hour. The provided CRON expression is evaluated in UTC timezone. As an example, cron expression of "30 " will result in attempts to execute the Pipeline Version at 30 minutes past the hour, every hour of the UTC clock. The attempt to execute the Pipeline Version is skipped if the Pipeline Version is still running at the time of the next attempt. By default, a job will only be run if there are pending changes to be processed in the input catalogs. The CLI and Pipeline API provide more options for scheduling batch pipeline versions by Time.

Screen capture of Activate Option dialog with schedule and Time schedule option selected
Figure 6. Time schedule option

Info: Run Latency

There are always a few moments of latency before the pipeline actually begins processing (during which the pipeline waits in the scheduled state). This is even true with the Run Now (on-demand) option. Scheduled operations can be even more delayed because they are triggered by the availability of data and system resources to start processing.

The CLI and Pipeline API allow more types of processing to be used for On-demand Batch pipelines.

Managing an Activated Pipeline Version

A screen capture of the screen that now shows the activated Pipeline Version in a running state
Figure 7. Running a Pipeline Version

When the state has changed to Running, the operational choices on the screen also change. The option to Pause the running Pipeline Version or to Cancel it is now available.

Info: Metrics Button

This button takes you to the companion application (for example, Splunk or Grafana) that monitors metrics from the running Pipeline Version. For additional information about Metrics, see the Logs, Monitoring and Alerts User Guide.

In the case of a Scheduled Pipeline Version, the listing will look something like this:

An example of a scheduled Pipeline Version listing
Figure 8. Scheduled Pipeline Version

In the case of a Run now Pipeline Version, the listing will look something like this:

Screen capture of the screen that shows the details of the running pipeline
Figure 9. Run Now Pipeline

Hint

The More menu in the top right-hand corner of the page provides additional functions.

Annotated more menu
Figure 10. Annotated More menu

Pipeline Details

Click on any listed Pipeline Version name to access details about that pipeline. Below is an example of the details of a sample pipeline.

Screen capture of a pipeline details screen
Figure 11. Pipeline Version Details Screen

Under the Details tab, you can see all of the run time information available about the Pipeline Version. This is also the screen where you can access and edit the Logging Level of this Pipeline Version. To change the Logging Level, click on the Edit button next to "Logging configuration" and enter the new or additional logging level in the dialog. Saving your changes stores them for that specific Pipeline Version, but they will not go into effect until a new job is started with that Pipeline Version.

Operations

The Operations tab provides a summary of operations on the pipeline version. Available information is illustrated below.

screen capture of run now operation tab data
Figure 12. Pipeline Version Operations Screen

Running Jobs

Click on the Jobs tab to examine any current and old job for this Pipeline Version.

Screen capture of Pipeline Version job display
Figure 13. Pipeline Version Jobs Screen

Pausing a Running Pipeline Version

Pausing a running Pipeline Version requires special considerations because the results of the "pause" depends on the kind of pipeline job being performed.

If a Batch pipeline version using a Schedule mode to run is subsequently paused, it will not stop processing immediately. Instead, the current job will run to completion and after that the Batch Pipeline Version will change to a Paused state. A Batch pipeline version running on-demand cannot be paused, it can only be canceled.

However, if you pause a running Stream pipeline job, the current state of the job is saved and the job is gracefully stopped at that point. When the resume command is issued, a new job is started to restart the Pipeline Version from the previously saved state. There is a time limit on how long the job can be paused: 1 hour is the default setting for data retention. This information is displayed as a reminder when you pause the pipeline job as shown below. If the paused job is cancelled, the saved state of the paused Job is discarded and the Pipeline Version moves to the ready state.

A screen capture of the GUI display for pausing a streaming pipeline job
Figure 14. Pause a Streaming Pipeline Version
A screen capture of the GUI display for a paused streaming pipeline job
Figure 15. Paused Pipeline with Alert

Resume a Running Pipeline

The resume operation is used to resume the data processing after the pipeline has been paused. When a paused Pipeline Version is resumed, the typical delay is 30-90 seconds. But this delay can last for several minutes if resources are limited. And, in some cases, the resume operation will have no effect, such as when:

  • the Pipeline Version is not in a paused state because it has been paused beyond the 1-hour time limit on a paused state, after which resume is unavailable
  • the Pipeline Version is not in a paused state for some other reason

When a Stream Pipeline Version is resumed, a new job begins again from the point where the pause occurred. However, Batch Pipeline Versions cannot be resumed the same way. This is because the batch job is allowed to run to completion before the "pause" goes into effect. No further jobs will be processed until the Pipeline Version is resumed and a new job is available.

Hint

When on this listing page, you can begin the process of creating a new Pipeline Version by clicking on this button: A screenshot of the create new version button on the pipeline listing page

Cancel a Running Pipeline Version

Cancelling a running Pipeline Version normally requires submitting the Pipeline ID and Pipeline Version ID along with the Cancel Request. The service cancels the running job without saving the state and returns the new pipeline status. The Portal simplifies this process as shown here.

Click on the Cancel button for the running pipeline job.

Screenshot of a running pipeline job
Figure 16. Cancel a Running Job

You will be asked for confirmation.

Screenshot of the cancelation confirmation dialog
Figure 17. Verify Cancellation

The following screen shows the job as Cancelled and the Pipeline Version returned to the Ready state. Because the Pipeline Version has been Cancelled and not Deleted, it can be reactivated for a new job. In the case of a Batch job, the job will run to completion before canceling any further scheduled jobs. In the case of a Stream job, the job would be immediately interrupted and canceled with no chance of restarting the interrupted job. The Stream Pipeline Version is returned to its Ready state.

Screenshot of closed job
Figure 18. Pipeline Job Cancelled

Delete a Pipeline Version

You can delete a Pipeline Version from the pipeline service without having any effect on other Pipeline Versions using the same pipeline JAR file. To delete a Pipeline Version, you must select the Pipeline Version to be deleted from the list page. Then, click the ellipsis menu as shown in the illustration below. Click the "Delete pipeline" selection to initiate the deletion process.

Screen capture of drop-down more menu showing Delete button
Figure 19. Delete Pipeline Button

The UI will show an alert dialog requesting confirmation of the fact that deleting a Pipeline Version is a destructive action that cannot be undone. Click Delete Version to complete deleting the Pipeline Version, or Cancel to stop the deletion action.

Finally, the pipeline list page refreshes without listing the deleted Pipeline Version. However, there is an alert message at the top of the page to confirm the deletion action.

Screenshot of pipeline deletion alert
Figure 20. Pipeline List Page after Deletion

Deactivate a Pipeline Version

If a Pipeline Version is still in a scheduled state after activation, deactivate it by clicking on the Deactivate button as shown below. After deactivation, the Pipeline Version returns to a Ready state where it is again available for activation.

Screenshot of the deactivate button for a recently activated and scheduled Pipeline Version
Figure 21. Deactivate a Selected Pipeline Version

Once the pipeline has been deactivated, it will be listed in a ready state, waiting for activation.

Screenshot of GUI for a pipeline being deactivated
Figure 22. Deactivating a Pipeline Version

↑ Top

results matching ""

    No results matching ""