Spark Metrics
Spark Metrics is a default dashboard available to you in Grafana that shows the standard metrics described below. Custom metrics can be enabled using Spark Accumulators.
Spark Accumulators
Spark allows the creation of custom numerical metrics using accumulators. Batch Pipelines using Apache Spark support the following type of accumulators: numerical. Once created, these accumulators become available as named metrics that Grafana can query and add to dashboards. The metric names are commonly prefixed with the phrase spark_accumulators_
.
For more information on using accumulators, see Custom Metrics and the documentation on Spark accumulators.
Spark Metrics for Pipelines
METRIC | DESCRIPTION |
driver_DAGScheduler_job_allJobs | Number of Pipeline Jobs |
driver_DAGScheduler_job_activeJobs | Number of Running Pipeline Jobs |
executor_threadpool_activeTasks | Number of Workers per Running Job |
executor_threadpool_completeTasks | Number of Completed Spark Tasks per Running Job |
driver_DAGScheduler_job_allJobs | Number of Spark Jobs per Pipeline Job |
driver_DAGScheduler_stage_failedStages | Number of Failed Stages per Pipeline Job |
driver_accumulators_.* | Accumulator Values |
Additional Spark Metrics for Pipelines
The following metrics are not displayed in the default dashboard but are available for use in custom dashboards.
Container Metrics
METRIC | UNIT | DESCRIPTION |
container_cpu_usage_seconds_total | Seconds | Container Total CPU used |
container_memory_working_set_bytes | Bytes | Container Memory used |
Spark Driver Metrics
METRIC | UNIT | DESCRIPTION |
driver_jvm_total_committed | Bytes | Memory available for use by the JVM for the driver. |
driver_jvm_total_init | Bytes | Amount of memory available for use by the JVM at initialization for the driver. |
driver_jvm_total_max | Bytes | Maximum amount of memory available to the JVM for the driver. |
driver_jvm_total_used | Bytes | Amount of memory currently used by the driver. |
driver_jvm_heap_used | Bytes | Amount of memory currently being used by the driver. |
driver_jvm_non_heap_used | Bytes | Amount of non-heap memory currently being used by the driver. |
Spark Executor Metrics
METRIC | UNIT | DESCRIPTION |
executor_threadpool_activeTasks | Count | Number of active executor tasks |
executor_threadpool_completeTasks | Count | Number of completed executor tasks |
jvm_G1_Young_Generation_time | Seconds | G1 young generation garbage collection time |
jvm_G1_Old_Generation_time | Seconds | G1 old generation garbage collection time |
jvm_G1_Young_Generation_count | Count | G1 young generation garbage collection count |
jvm_G1_Old_Generation_count | Count | G1 old generation garbage collection count |
jvm_heap_usage | Bytes | Amount of memory currently being used by the executor. |
jvm_non_heap_usage | Bytes | Amount of non-heap memory currently being used by the executor. |
Filtering Pipeline Metrics
You can filter pipelines metrics using these Prometheus filters:
FILTER BY | KEY | EXAMPLE |
Pipeline Id | PipeLineId | PipeLineId="00112233-4455-6677-8899-aabbccddeeff" |
Job Id | DeploymentId | DeploymentId="00112233-4455-6677-8899-aabbccddeeff" |
Pod Name | pod_name="job--worker-" | pod_name="job-00112233-4455-6677-8899-aabbccddeeff-worker-0" |
Executor Id | executorId | executorId="0" |
For example, to get the G1 young generation garbage collection for all executors for a given pipeline ID and job ID you would use this filter:
jvm_G1_Young_Generation_time{DeploymentId="ffeeddcc-bbaa-9988-7766-554433221100",PipeLineId="00112233-4455-6677-8899-aabbccddeeff"}
Spark Metrics for Notebooks
METRIC | DESCRIPTION |
Average Memory per Executor | Average memory per executor and Spark driver |
Average and Total Spark Memory Usage for All Units | Aggregate of average memory per executor and driver. Also aggregates all memory of the cluster |
Active Cores | Number of active cores |
Stages | Stages, such as running, pending and failed |
Tasks by All Executors | Tasks by executors, active, and pool. This is another way to observe the active and available cores |
Message Processing Time | Average message processing time |
Completed Tasks by Each Executer | Completed tasks by executors and counters |
File System Reads/Writes by Executors | File system read and writes in bytes (when the filesystem is used within jobs only) |