Spark Metrics

Spark Metrics is a default dashboard available to you in Grafana that shows the standard metrics described below. Custom metrics can be enabled using Spark Accumulators.

Spark Accumulators

Spark allows the creation of custom numerical metrics using accumulators. Batch Pipelines using Apache Spark support the following type of accumulators: numerical. Once created, these accumulators become available as named metrics that Grafana can query and add to dashboards. The metric names are commonly prefixed with the phrase spark_accumulators_.

For more information on using accumulators, see Custom Metrics and the documentation on Spark accumulators.

Spark Metrics for Pipelines

driver_DAGScheduler_job_allJobs Number of Pipeline Jobs
driver_DAGScheduler_job_activeJobs Number of Running Pipeline Jobs
executor_threadpool_activeTasks Number of Workers per Running Job
executor_threadpool_completeTasks Number of Completed Spark Tasks per Running Job
driver_DAGScheduler_job_allJobs Number of Spark Jobs per Pipeline Job
driver_DAGScheduler_stage_failedStages Number of Failed Stages per Pipeline Job
driver_accumulators_.* Accumulator Values

Additional Spark Metrics for Pipelines

The following metrics are not displayed in the default dashboard but are available for use in custom dashboards.

Container Metrics

container_cpu_usage_seconds_total Seconds Container Total CPU used
container_memory_working_set_bytes Bytes Container Memory used

Spark Driver Metrics

driver_jvm_total_committed Bytes Memory available for use by the JVM for the driver.
driver_jvm_total_init Bytes Amount of memory available for use by the JVM at initialization for the driver.
driver_jvm_total_max Bytes Maximum amount of memory available to the JVM for the driver.
driver_jvm_total_used Bytes Amount of memory currently used by the driver.
driver_jvm_heap_used Bytes Amount of memory currently being used by the driver.
driver_jvm_non_heap_used Bytes Amount of non-heap memory currently being used by the driver.

Spark Executor Metrics

executor_threadpool_activeTasks Count Number of active executor tasks
executor_threadpool_completeTasks Count Number of completed executor tasks
jvm_G1_Young_Generation_time Seconds G1 young generation garbage collection time
jvm_G1_Old_Generation_time Seconds G1 old generation garbage collection time
jvm_G1_Young_Generation_count Count G1 young generation garbage collection count
jvm_G1_Old_Generation_count Count G1 old generation garbage collection count
jvm_heap_usage Bytes Amount of memory currently being used by the executor.
jvm_non_heap_usage Bytes Amount of non-heap memory currently being used by the executor.

Filtering Pipeline Metrics

You can filter pipelines metrics using these Prometheus filters:

Pipeline Id PipeLineId PipeLineId="00112233-4455-6677-8899-aabbccddeeff"
Job Id DeploymentId DeploymentId="00112233-4455-6677-8899-aabbccddeeff"
Pod Name pod_name="job--worker-" pod_name="job-00112233-4455-6677-8899-aabbccddeeff-worker-0"
Executor Id executorId executorId="0"

For example, to get the G1 young generation garbage collection for all executors for a given pipeline ID and job ID you would use this filter:


Spark Metrics for Notebooks

Average Memory per Executor Average memory per executor and Spark driver
Average and Total Spark Memory Usage for All Units Aggregate of average memory per executor and driver. Also aggregates all memory of the cluster
Active Cores Number of active cores
Stages Stages, such as running, pending and failed
Tasks by All Executors Tasks by executors, active, and pool. This is another way to observe the active and available cores
Message Processing Time Average message processing time
Completed Tasks by Each Executer Completed tasks by executors and counters
File System Reads/Writes by Executors File system read and writes in bytes (when the filesystem is used within jobs only)

results matching ""

    No results matching ""