Pipeline Monitoring

Overview

HERE platform pipelines generate certain standard metrics that can be used to track their status over time. The standard metrics are listed in the Logs, Monitoring, and Alerts User Guide. Custom metrics can also be inserted into the pipeline's code. These metrics are all displayed in a Pipeline Status dashboard and are used to generate alerts for specific events associated with a pipeline job, like a failed job.

Monitor Pipeline Status

To monitor the status of a pipeline, a Pipeline Status dashboard is available in Grafana. From the platform portal, open the Launcher menu and select Primary monitoring and alerts (No. 1 in Figure 1). This takes you to the Grafana home page.

Monitoring and Alerts screen capture
Figure 1. Primary monitoring and Alerts

Note

Launcher menu item 2 is also Grafana, but only for high availability catalogs.
Menu item 3 is the link to Splunk for reviewing event and error logs.

The home page looks something like this:

Grafana home page screen capture
Figure 2. Grafana Home Page

The home page allows access to several dashboards. Several default dashboards are listed on the left side of the page. And the available User Defined Alerts are listed on the right side of the page.

From the list of default dashboards, locate the Pipeline Status dashboard. Click on the dashboard name to open it.

Choose Pipeline Status dashboard screen capture.
Figure 3. Choose Pipeline Status dashboard.

The Pipeline Status dashboard displays the following status of Pipeline jobs:

  • Failed
  • Completed
  • Canceled
  • Submitted
  • Running

Each Pipeline Status is color-coded to allow quick identification. The dashboard can also be filtered by Pipeline Status and Pipeline Type (Flink or Spark). For more details, see Pipeline Status definitions.

Pipeline status dashboard screen capture
Figure 4. Pipeline Status Dashboard

Note: Default Dashboard Settings

Default Time Period: Last 24 hours
Default Refresh Interval: 30 minutes

Configure Pipeline Job Failure Alerts

  1. Click on the Grafana logo in the top left corner of the screen. This opens the side menu bar.

  2. Locate the Alerting item on the menu bar. Then, locate the Notification Channel on the submenu as shown here.

    Open Alerts screen capture
    Figure 5. Open Alerts
  3. Click on “Notification Channels” and the screen will change to show something like the image below.

    Alerting menu item screen capture
    Figure 6. List of Notification channels
  4. Locate the Notification Channel named "Pipeline Failure Notification." Click on the channel's Edit button for to change the configuration of the notification channel.

    Screen capture of edit button for selecting a notification channel.
    Figure 7. Edit Pipeline Failure Notification Channel.
  5. Specify the list of email addresses that will receive failure alerts as shown here.

    Screen capture of specifying email addresses for alert notification.
    Figure 8. Specify Email Addresses.
  6. To test your alert changes, click the Send Test button. This will send a test message to each email on the alert list.

  7. Click the Save button to save your notification changes.

With these changes, the listed email addresses will start receiving alerts when Pipeline Jobs fail. You can also see the alerts on the Pipeline Status dashboard.

Note: Default Alert Settings

Default Alert Interval: Last 1 minute
Default Alert frequency for Failed jobs: Every 60 seconds

Dashboard and Failure Alert Limitations

Caution: Dashboard Sampling

When choosing a larger sampling time-period in Grafana, it uses a sampling mechanism that shows fewer data points than it should. This allows for quicker responses, but to see more accurate data, you should shorten the time-period to be investigated.

Failure Email Alert Behavior

Failure Emails are only sent when the alert's state changes. For example, if a pipeline job fails, the alert goes to Alerting state and a failure email is sent to the specified recipients. If another pipeline job fails within the default alert interval of 1 minute, a second email cannot be sent. The first alert state must transition to the “No Data” state, at the end of the 1 minute interval, before any subsequent failures can trigger alert emails. This behavior results in the following two emails being sent:

  • [Alerting] - For Pipeline Jobs that failed within the last 1 minute period, including details about the failed pipeline jobs. Sent when the alert is first reported.
  • [No Data] - For Pipeline Jobs that failed in within the last 1 minute period, including an empty email body. Sent at the end of the 1 minute interval.

This is an inherent behavior of Grafana and not a limitation of the HERE platform. Figure 9 illustrates what is happening and how Fault 2 is not processed.

Sequence diagram of Grafana alert handling.
Figure 9. Grafana Alert Handling

Note: Splunk Dashboard

Click on the Logs menu item to get to the Splunk Dashboard. This will not focus on any one specific job; see the Error Logs section below for how to access the logs for a specific job.

Error Logs

There are 4 levels of logging available for platform pipelines: Debug, Warn, Error, and Info. The logging level can be set using the platform portal, the CLI, or the API. Or, you can just use the default logging level of warn.

To examine the logs for running pipeline jobs, click on View Jobs for a Pipeline Version to display the jobs history.

Then, click on the Logging URL button for the job you wish to troubleshoot. This will open the Splunk dashboard where the logs for the selected Pipeline Version can be viewed.

screenshot of logging URL button
Figure 10. Logging URL Button

For more information, see Pipeline Logging.

Set Logging Levels from the platform portal

Different levels of logging are available for different purposes. HERE platform pipelines support the following levels of logging:

  • Debug — Includes fine-grained informational events that are most useful to troubleshoot a pipeline.
  • Info — Includes informational messages that highlight the progress of the pipeline at a coarse-grained level.
  • Warn — Includes information on potentially harmful situations; including other run-time situations that are undesirable or unexpected, but not necessarily "wrong". This is the default logging level.
  • Error — Includes other run-time errors or unexpected conditions such as error events that might still allow the pipeline to continue running.

Setting the logging level from the platform portal can be done from the Pipeline Version Details page. An example is shown in Figure 5.

screenshot of pipeline version detail page
Figure 11. Change Pipeline Version logging level

The Logging Configuration panel is outlined in red here. The current logging level for this pipeline is display. To change the level, click the Edit button. This displays the dialog box shown in Figure 6.

screenshot of logging level edit dialog box
Figure 12. Edit logging level dialog box

Info: Loggers and Levels

A Logging Level is set for a specific Pipeline Version and all of the Jobs it executes by a Logger. The default logger is set at the root level for the entire pipeline. But a logger can also be set for a specific pipeline class. And, because you can have multiple loggers, it is possible to set different loggers to different logging levels. This allows monitoring different parts of the executing pipeline code at different logging levels, if set up correctly.

To change the root logging level, use the drop-down list at the top of the dialog box to select the new logging level. Additional loggers can be added or deleted using the controls shown in Figure 7. To change the logging level of one of these loggers, click the indicated control and select the new level from the drop-down list.

screenshot of logging level dialog box controls
Figure 13. Control identification

If adding a new logger, the dialog box will change to provide a place to enter the information for the new logger as shown in Figure 8. The logger name is normally the class name in the pipeline code to which it should be linked. The logging level can be set as needed and does not have to match the root logging level.

screenshot of a logging level dialog box adding a new logger
Figure 14. Add a new logger

Click Add to close the add function. Then, click Saveto save the addition.

When adding a new logger, if you chose a logger that already exists, you will get an error message like that shown in Figure 9.

screen capture of add a logger error message
Figure 15. Error: Logger already exists

Figure 10 shows the results of adding a new logger and how it is displayed on the Pipeline Version Detail page.

screenshot of Pipeline Version detail page with added logger listed
Figure 16. Added logger displayed

Caution

If you create a logger that cannot be linked to a class in the pipeline code, there will be no logging entries from that logger.

See Also

results matching ""

    No results matching ""