Skip to content

Latest commit

 

History

History
430 lines (344 loc) · 8.54 KB

File metadata and controls

430 lines (344 loc) · 8.54 KB

Kepler Metrics

This document describes the metrics exported by Kepler for monitoring energy consumption at various levels (node, container, process, VM).

Overview

Kepler exports metrics in Prometheus format that can be scraped by Prometheus or other compatible monitoring systems.

Metric Types

  • COUNTER: A cumulative metric that only increases over time
  • GAUGE: A metric that can increase and decrease

Metrics Reference

Node Metrics

These metrics provide energy and power information at the node level.

kepler_node_cpu_active_joules_total

  • Type: COUNTER
  • Description: Energy consumption of cpu in active state at node level in joules
  • Labels:
    • zone
    • path
  • Constant Labels:
    • node_name

kepler_node_cpu_active_watts

  • Type: GAUGE
  • Description: Power consumption of cpu in active state at node level in watts
  • Labels:
    • zone
    • path
  • Constant Labels:
    • node_name

kepler_node_cpu_idle_joules_total

  • Type: COUNTER
  • Description: Energy consumption of cpu in idle state at node level in joules
  • Labels:
    • zone
    • path
  • Constant Labels:
    • node_name

kepler_node_cpu_idle_watts

  • Type: GAUGE
  • Description: Power consumption of cpu in idle state at node level in watts
  • Labels:
    • zone
    • path
  • Constant Labels:
    • node_name

kepler_node_cpu_info

  • Type: GAUGE
  • Description: CPU information from procfs
  • Labels:
    • processor
    • vendor_id
    • model_name
    • physical_id
    • core_id

kepler_node_cpu_joules_total

  • Type: COUNTER
  • Description: Energy consumption of cpu at node level in joules
  • Labels:
    • zone
    • path
  • Constant Labels:
    • node_name

kepler_node_cpu_usage_ratio

  • Type: GAUGE
  • Description: CPU usage ratio of a node (value between 0.0 and 1.0)
  • Constant Labels:
    • node_name

kepler_node_cpu_watts

  • Type: GAUGE
  • Description: Power consumption of cpu at node level in watts
  • Labels:
    • zone
    • path
  • Constant Labels:
    • node_name

kepler_node_gpu_active_joules_total

  • Type: COUNTER
  • Description: Energy consumption of gpu in active state at node level in joules
  • Labels:
    • gpu
    • gpu_uuid
    • gpu_name
    • vendor
  • Constant Labels:
    • node_name

kepler_node_gpu_active_watts

  • Type: GAUGE
  • Description: GPU active power (total - idle) in watts
  • Labels:
    • gpu
    • gpu_uuid
    • gpu_name
    • vendor
  • Constant Labels:
    • node_name

kepler_node_gpu_idle_joules_total

  • Type: COUNTER
  • Description: Energy consumption of gpu in idle state at node level in joules
  • Labels:
    • gpu
    • gpu_uuid
    • gpu_name
    • vendor
  • Constant Labels:
    • node_name

kepler_node_gpu_idle_watts

  • Type: GAUGE
  • Description: GPU idle power (auto-detected minimum) in watts
  • Labels:
    • gpu
    • gpu_uuid
    • gpu_name
    • vendor
  • Constant Labels:
    • node_name

kepler_node_gpu_joules_total

  • Type: COUNTER
  • Description: Energy consumption of gpu at node level in joules
  • Labels:
    • gpu
    • gpu_uuid
    • gpu_name
    • vendor
  • Constant Labels:
    • node_name

kepler_node_gpu_watts

  • Type: GAUGE
  • Description: Total GPU power consumption in watts
  • Labels:
    • gpu
    • gpu_uuid
    • gpu_name
    • vendor
  • Constant Labels:
    • node_name

Container Metrics

These metrics provide energy and power information for containers.

kepler_container_cpu_joules_total

  • Type: COUNTER
  • Description: Energy consumption of cpu at container level in joules
  • Labels:
    • container_id
    • container_name
    • runtime
    • state
    • zone
    • pod_id
  • Constant Labels:
    • node_name

kepler_container_cpu_watts

  • Type: GAUGE
  • Description: Power consumption of cpu at container level in watts
  • Labels:
    • container_id
    • container_name
    • runtime
    • state
    • zone
    • pod_id
  • Constant Labels:
    • node_name

kepler_container_gpu_joules_total

  • Type: COUNTER
  • Description: Energy consumption of gpu at container level in joules
  • Labels:
    • container_id
    • container_name
    • runtime
    • state
    • pod_id
  • Constant Labels:
    • node_name

kepler_container_gpu_watts

  • Type: GAUGE
  • Description: Power consumption of gpu at container level in watts
  • Labels:
    • container_id
    • container_name
    • runtime
    • state
    • pod_id
  • Constant Labels:
    • node_name

Process Metrics

These metrics provide energy and power information for individual processes.

kepler_process_cpu_joules_total

  • Type: COUNTER
  • Description: Energy consumption of cpu at process level in joules
  • Labels:
    • pid
    • comm
    • exe
    • type
    • state
    • container_id
    • vm_id
    • zone
  • Constant Labels:
    • node_name

kepler_process_cpu_seconds_total

  • Type: COUNTER
  • Description: Total user and system time of cpu at process level in seconds
  • Labels:
    • pid
    • comm
    • exe
    • type
    • container_id
    • vm_id
  • Constant Labels:
    • node_name

kepler_process_cpu_watts

  • Type: GAUGE
  • Description: Power consumption of cpu at process level in watts
  • Labels:
    • pid
    • comm
    • exe
    • type
    • state
    • container_id
    • vm_id
    • zone
  • Constant Labels:
    • node_name

kepler_process_gpu_joules_total

  • Type: COUNTER
  • Description: Energy consumption of gpu at process level in joules
  • Labels:
    • pid
    • comm
    • exe
    • type
    • state
    • container_id
    • vm_id
  • Constant Labels:
    • node_name

kepler_process_gpu_watts

  • Type: GAUGE
  • Description: Power consumption of gpu at process level in watts
  • Labels:
    • pid
    • comm
    • exe
    • type
    • state
    • container_id
    • vm_id
  • Constant Labels:
    • node_name

Virtual Machine Metrics

These metrics provide energy and power information for virtual machines.

kepler_vm_cpu_joules_total

  • Type: COUNTER
  • Description: Energy consumption of cpu at vm level in joules
  • Labels:
    • vm_id
    • vm_name
    • hypervisor
    • state
    • zone
  • Constant Labels:
    • node_name

kepler_vm_cpu_watts

  • Type: GAUGE
  • Description: Power consumption of cpu at vm level in watts
  • Labels:
    • vm_id
    • vm_name
    • hypervisor
    • state
    • zone
  • Constant Labels:
    • node_name

Pod Metrics

These metrics provide energy and power information for pods.

kepler_pod_cpu_joules_total

  • Type: COUNTER
  • Description: Energy consumption of cpu at pod level in joules
  • Labels:
    • pod_id
    • pod_name
    • pod_namespace
    • state
    • zone
  • Constant Labels:
    • node_name

kepler_pod_cpu_watts

  • Type: GAUGE
  • Description: Power consumption of cpu at pod level in watts
  • Labels:
    • pod_id
    • pod_name
    • pod_namespace
    • state
    • zone
  • Constant Labels:
    • node_name

kepler_pod_gpu_joules_total

  • Type: COUNTER
  • Description: Energy consumption of gpu at pod level in joules
  • Labels:
    • pod_id
    • pod_name
    • pod_namespace
    • state
  • Constant Labels:
    • node_name

kepler_pod_gpu_watts

  • Type: GAUGE
  • Description: Power consumption of gpu at pod level in watts
  • Labels:
    • pod_id
    • pod_name
    • pod_namespace
    • state
  • Constant Labels:
    • node_name

Other Metrics

Additional metrics provided by Kepler.

kepler_build_info

  • Type: GAUGE
  • Description: A metric with a constant '1' value labeled with version information
  • Labels:
    • arch
    • branch
    • revision
    • version
    • goversion

Experimental Metrics

⚠️ Warning: The following metrics are experimental and may change or be removed in future versions. They are provided for early testing and feedback purposes.

Platform Power Metrics

These experimental metrics provide platform-level power information from BMC sources (e.g., Redfish). Enable the experimental Redfish feature to collect these metrics.

kepler_platform_watts

  • Type: GAUGE
  • Description: Current platform power in watts from BMC (PowerSubsystem or deprecated Power API)
  • Labels:
    • source
    • node_name
    • bmc_id
    • chassis_id
    • source_id
    • source_name
    • source_type

This documentation was automatically generated by the gen-metric-docs tool.