Introduction
Welcome on Scaphandre documentation.
Scaphandre is a monitoring agent, dedicated to energy consumption metrics. It's purpose is to help measuring and thus understanding tech services energy consumption patterns. This is key, in our opinion, to enable the tech industry to shift towards more sustainability. 💚
If at this point you think "why bother ?", or if you want more details about this project's motivations, please have a look at the why section.
If not and you want to proceed, just directly jump to the tutorials section.
If you need more in-depth, use-case oriented instructions, the how-to guides are here for you.
Explanations is about theoritical concepts behind scaphandre and the reasons for the technical choices that have been made so far.
If you are already using, hacking or exploring scaphandre and need precise informations about one of its components, go to the references section. (The code documentation itself is here).
Quickstart
To quickly run scaphandre in your terminal you may use docker:
docker run -v /sys/class/powercap:/sys/class/powercap -v /proc:/proc -ti hubblo/scaphandre stdout -t 15
Or if you downloaded or built a binary, you'd run:
scaphandre stdout -t 15
Here we are using the stdout exporter to print current power consumption usage in the terminal during 15 seconds.
You should get an output like:
Host: 9.391334 W Core Uncore DRAM
Socket0 9.392 W 1.497082 W
Top 5 consumers:
Power PID Exe
4.808363 W 642 "/usr/sbin/dockerd"
4.808363 W 703 "/usr/bin/docker-containerd"
4.808363 W 1028 "/usr/local/bin/redis-server"
0 W 1 "/usr/lib/systemd/systemd"
0 W 2 ""
------------------------------------------------------------
Let's briefly describe what you see here. First Line is the power consumption of the machine (between the two last measurements). Second line is the power consumption of the first CPU socket plus the detail by RAPL Domain. If you have more than one CPU Socket, you'll have multiple SocketX lines. Then you have the 5 processes consuming the most power during the last two measurements.
If you don't get this output and get an error, jump to the Troubleshooting section of the documentation.
At that point, you're ready to use scaphandre. The Stdout exporter is very basic and other exporters should allow you to use and send those metrics the way you like.
The prometheus exporter, for example, allows you to expose power consumption metrics as an HTTP endpoint that can be scrapped by a prometheus instance:
docker run -v /sys/class/powercap:/sys/class/powercap -v /proc:/proc -p 8080:8080 -ti hubblo/scaphandre prometheus
Here is the same command with a simple binary:
scaphandre prometheus
To validate that the metrics are available, send an http request from another terminal:
curl -s http://localhost:8080/metrics
Here you can see examples of graphs you can get thanks to scaphandre, the prometheus exporter, prometheus and grafana.
Installation & compilation
Compile scaphandre from source
We recommand using this version of the rust toolchain or later:
cargo --version
cargo 1.48.0 (65cbdd2dc 2020-10-14)
rustc --version
rustc 1.48.0 (7eac88abb 2020-11-16)
To be sure to be up to date, you may install rust from the official website instead of your package manager.
To hack scaph, or simply be up to date with latest developments, you can download scaphandre from the main branch:
git clone https://github.com/hubblo-org/scaphandre.git
cd scaphandre
cargo build # binary path is target/debug/scaphandre
To use the latest code for a true use case, build for release instead of debug:
cargo build --release
Binary path is target/release/scaphandre
.
More to come
More tutorials to come, for a proper installation, like:
- install scaphandre as a proper systemd service
- scaphandre in your favorite GNU/Linux distribution (package creators)
- run scaphandre in a container
- run scaphandre on kubernetes
- scaphandre on MacOSX
- and more...
Kubernetes
This tutorial uses Helm to install Scaphandre, Prometheus and Grafana.
Install Scaphandre
First we install Scaphandre which runs as a daemon set which creates a pod on each node for collecting the metrics.
helm install scaphandre helm/scaphandre
Install Prometheus
Next we will install Prometheus which will scrape the metrics generated by Scaphandre.
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add kube-state-metrics https://kubernetes.github.io/kube-state-metrics
helm repo update
helm install prometheus prometheus-community/prometheus \
--set alertmanager.persistentVolume.enabled=false \
--set server.persistentVolume.enabled=false
This setup should only be used for testing as the Prometheus data is not persisted if the pods are deleted.
You can access the Prometheus web UI by creating a port forwarding connection.
kubectl port-forward deploy/prometheus-server 9090:9090
Install Grafana
Create a configmap to store the Grafana dashboard.
kubectl create configmap scaphandre-dashboard \
--from-file=scaphandre-dashboard.json=docs_src/tutorials/grafana-kubernetes-dashboard.json
Install Grafana.
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm install grafana grafana/grafana --values docs_src/tutorials/grafana-helm-values.yaml
Get the Grafana web UI password which is randomly generated.
kubectl get secret grafana -o jsonpath="{.data.admin-password}" | base64 --decode
Create a port forwarding connection to the Grafana pod.
kubectl port-forward deploy/grafana 3000:3000
Open Grafana in your browser at http://localhost:3000 the username is admin.
Cleaning up
Deleting the Helm releases will remove all the resources we created.
helm delete grafana prometheus scaphandre
Propagate power consumption metrics from hypervisor to virtual machines (Qemu/KVM)
Introduction
A major pain point in measuring power consumption is doing so inside a virtual machine. A virtual machine usually doesn't have access to power metrics.
Scaphandre aims at solving that by enabling a communication between a scaphandre instance on the hypervisor/bare metal machine and another one running on the virtual machine. The scaphandre agent on the hypervisor will compute the metrics meaningful for that virtual machine and the one on the VM access those metrics to allow its user/administrator to use the data as if they had access to power metrics in the first place (as if they were on a bare metal machine).
This allows to break opacity in a virtualization context, if you have access to the hypervisor, or in a public cloud context if the provider uses scaphandre on its hypervisors.

How to
This is working on Qemu/KVM hypervisors only.
The idea is to run the agent on the hypervisor, with the qemu exporter:
scaphandre qemu
More examples for a production ready setup will be added soon (systemd service, docker container, ...). If you think the documentation needs a refresh now, please contribute :)
For each virtual machine you want to give access to its metrics, create a tmpfs mountpoint:
mount -t tmpfs tmpfs_DOMAIN_NAME /var/lib/libvirt/scaphandre/DOMAIN_NAME -o size=5m
In the definition of the virtual machine (ehre we are using libvirt), ensure you have a filesystem configuration to give access to the mountpoint:
virsh edit DOMAIN_NAME
Then add:
<filesystem type='mount' accessmode='passthrough'>
<driver type='virtiofs'/>
<source dir='/var/lib/libvirt/scaphandre/DOMAIN_NAME'/>
<target dir='scaphandre'/>
<readonly />
</filesystem>
Save and (re)start the virtual machine.
Then connect to the virtual machine and mount the filesystem:
mount -t 9p -o trans=virtio scaphandre /var/scaphandre
You can now run scaphandre to export the metrics with the exporter of your choice (here prometheus):
scaphandre --vm prometheus
Please refer to the qemu exporter reference for more details.
Note: This how to is only suitable for a "manual" use case. For all automated systems like openstack or proxmox, some more work needs to be done to make the integration of those steps easier.
Get process-level power consumption in my grafana dashboard
Now we'll see how to get valuable data in a dashboard. Let's say you want to track the power consumption of a given process or application in a dashboard and eventually set thresholds on it. WHat do you need to get that subset of the power consumption of the host visually ?
You need basically 3 components for that:
- scaphandre running with the prometheus exporter
- prometheus
- grafana
We'll say that you already have a running prometheus server and an available grafana instance and that you have added prometheus as a datasource in grafana.
How to get metrics per process as you may see here ?
The metric that I need from the prometheus exporter to do that is: scaph_process_power_consumption_microwatts
. This metric is a wallet for the power consumption of all the running processes on the host at a given time.
This is a prometheus metrics, so you have labels to filter on the processes you are interested in. Currently the available labels are: instance
, exe
, job
and pid
.
If I want to get power consumption (in Watts) for all processes related to nginx running on a host with ip 10.0.0.9 I may use that query, in grafana, based on the prometheus datasource:
scaph_process_power_consumption_microwatts{cmdline=~".*nginx.*", instance="10.0.0.9:8080"} / 1000000
Here we assume that scaphandre/the prometheus exporter is running on port number 8080
.
Here is how it looks, creating a panel in grafana:
Those labels are explained in much more detail here.
How scaphandre computes per process power consumption
As you can see with the prometheus exporter reference, scaphandre exporters can provide process level power consumption metrics. This section will explain how it is done and how it may be improved in the future.
Some details about RAPL
We'll talk here about the case where scaphandre is able to effectively measure the power consumption of the host (see compatibility section for more on sensors and their prerequesites) and specifically about the PowercapRAPL sensor.
Let's clarify what's happening when you collect metrics with scaphandre and this sensor. RAPL stands for Running Average Power Limit. It's a technnology embedded in most Intel and AMD x86 CPUs produced afeter 2012. Thanks to this technology it is possible to get the total energy consumption of the CPU, of the consumption per CPU socket, plus in some cases, the consumption of the DRAM controller. In most cases it represents the vast majority of the energy consumption of the machine (except when running GPU intensive workloads, for example). Further improvements shall be made in scaphandre to fully measure the consumption when GPU are involved (or a lot of hard drives on the same host...).
Between scaphandre and those data is the powercap kernel module that writes the energy consumption in files. Scaphandre, reads those files, stores the data in buffer and then allows for more processing through the exporters.
How to get the consumption of one process ?
The PowercapRAPL sensor does actually some more than just collecting those energy consumption metrics (and casting it in power consumption metrics). Every time the exporter asks for a measurement (either periodically like in the Stdout exporter, or every time a request comes like for the Prometheus exporter) the sensor reads the values of the energy counters from powercap, stores those values and does the same for the CPU usage statistics of the CPU (the one you can see in /proc/stats
) and for each running process on the machine at that time (see /proc/PID/stats
). With those data it is possible to compute the ratio of CPU time actively spent for a given PID on the CPU time actively spent doing something. With this ratio we can then get the subset of power consumption that is related to that PID on a given timeframe (between two measurement requests).
How to get the consumption of an application/a service ?
Services and programs are often not running only one PID. It's needed to aggregate the consumption of all related PIDs to know what this service is actually consuming.
To do that, in the current state of scaphandre development, you can use the Prometheus exporter, and then use Prometheus TSDB and query language capabilities. You'll find examples looking at the graphs and queries here. In a near future, more advanced features may be implemented in scaphandre to allow such classification even if you don't have access to a proper TSDB.
Internal structure
Scaphandre is designed to be extensible. As it performs basically two tasks: collecting/pre-computing the power consumption metrics and shipping it, it is composed of two main components: a sensor and an exporter. Each can be implemented in different wats, to match a certain use case. When you run scaphandre from the command line, -s
allows you to choose the sensor you want to use, and the next subcommand is the name of the exporter.
Sensors
Sensors are meant to:
- get the power consumptions metrics of the host
- make it available for the exporter
The PowercapRAPL for instance, gets and transforms metrics coming from the powercap Linux kernel module, that serves as an interface to get the data from the RAPL feature of x86 CPUs. Because this feature is only accessible when you are running on a bare metal machine, this sensor will not work in a virtual machine, except if you first run scaphandre on the hypervisor and make the VM metrics available, with the qemu exporter, to scaphandre running inside the virtual machine.
When you don't have access to the hypervisor/bare-metal machine (ie. when you run on public cloud instances and your provider doesn't run scaphandre) you still have the option to estimate the power consumption, based on both the ressources (cpu/gpu/ram/io...) consumed by the virtual machine at a given time, and the characteristics of the underlying hardware. This is the way we are designing the future estimation-based sensor, to match that use case.
Looking at the code, you'll find that the interface between metrics and the exporters is in fact the Topology object. This is intended to be asked by the exporter through the get_topology method of the sensor.
Exporters
An exporter is expected to:
- ask the sensors to get new metrics and store them for later, potential usage
- export the current metrics
The Stdout exporter exposes the metrics on the standard output (in your terminal). The prometheus exporter exposes the metrics on an http endpoint, to be scraped by a prometheus instance. An exporter should be created for each monitoring scenario (do you want to feed your favorite monitoring/data analysis tool with scaphandre metrics ? feel free to open a PR to create a new exporter !).
As introduced in the sensors section, the Qemu exporter, is very specific. It is only intended to collect metrics related to running virtual machines on a Qemu/KVM hypervisor. Those metrics can then be made available to each virtual machine and it's own scaphandre instance, running the PowercapRAPL sensor (with the --vm
flag on). The qemu exporter puts VM's metrics in files the same way the powercap kernel module does it. It mimics this behavior, so the sensor can act the same way it would on a bare metal machine.
About containers
There are several ways scaphandre can interact with containers.
You may run scaphandre in a container, to not have to manage the dependencies, then measure the power consumption of the bare metal host. This is described in the quickstart tutorial. Note that you need to expose /sys/class/powercap
and /proc
as volumes in the container to allow scaphandre to get the relevant metrics from the bare metal host.
Scaphandre may help you measure the power consumption of containers running on a given host. You can already get to that goal using the tips provided in the howto section called "Get process level power consumption". It may still require some tweaking and inventiveness from you in making the approriate queries to your favorite TSDB. This should be made easier by the upcoming scaphandre features.
Another use case scenario would be to measure the power consumption of a container orchestrator (like kubernetes), its nodes and the containers and applications running on it. This is a feature we are currently working on (you may try yourself the helm chart that is proposed in that thread, before it is officially supported).
As described here, scaphandre provides several ways (sensors) to collect the power consumption metrics. Depending on your use case a sensor should be more suitable than the other. Each of them comes with strengths and weaknesses. This is basically always a tradeoff between precision and simplicity. This is especially true if you run a container-based workloads on public cloud instances. We are working to provide a solution for that as well.
Prometheus exporter

Usage
You can launch the prometheus exporter this way (running the default powercap_rapl sensor):
scaphandre prometheus
As always exporter's options can be displayed with -h
:
scaphandre prometheus -h
scaphandre-prometheus
Prometheus exporter exposes power consumption metrics on an http endpoint (/metrics is default) in prometheus accepted
format
USAGE:
scaphandre prometheus [FLAGS] [OPTIONS]
FLAGS:
-h, --help Prints help information
-q, --qemu Instruct that scaphandre is running on an hypervisor
-V, --version Prints version information
OPTIONS:
-a, --address <address> ipv6 or ipv4 address to expose the service to [default: ::]
-p, --port <port> TCP port number to expose the service [default: 8080]
-s, --suffix <suffix> url suffix to access metrics [default: metrics]
With default options values, the metrics are exposed on http://localhost:8080/metrics.
Use -q or --qemu option if you are running scaphandre on a hypervisor. In that case a label with the vm name will be added to all qemu-system*
processes.
This will allow to easily create charts consumption for each vm and defined which one is the top contributor.
Metrics exposed
All metrics have a HELP section provided on /metrics (or whatever suffix you choosed to expose them).
Here are some key metrics that you will most probably be interested in:
scaph_host_power_microwatts
: Power measurement on the whole host, in microwatts (GAUGE)scaph_process_power_consumption_microwatts{exe="$PROCESS_EXE",pid="$PROCESS_PID",cmdline="path/to/exe --and-maybe-options"}
: Power consumption due to the process, measured on at the topology level, in microwatts. PROCESS_EXE being the name of the executable and PROCESS_PID being the pid of the process. (GAUGE)
For more details on that metric labels, see this section.
And some more deep metrics that you may want if you need to make more complex calculations and data processing:
scaph_host_energy_microjoules
: Energy measurement for the whole host, as extracted from the sensor, in microjoules. (COUNTER)scaph_host_energy_timestamp_seconds
: Timestamp in seconds when hose_energy_microjoules has been computed. (COUNTER)scaph_socket_power_microwatts{socket_id="$SOCKET_ID"}
: Power measurement relative to a CPU socket, in microwatts. SOCKET_ID being the socket numerical id (GAUGE)
If you hack scaph or just want to tinvestigate its behavior, you may be interested in some internal metrics:
-
scaph_self_mem_total_program_size
: Total program size, measured in pages -
scaph_self_mem_resident_set_size
: Resident set size, measured in pages -
scaph_self_mem_shared_resident_size
: Number of resident shared pages (i.e., backed by a file) -
scaph_self_topo_stats_nb
: Number of CPUStat traces stored for the host -
scaph_self_topo_records_nb
: Number of energy consumption Records stored for the host -
scaph_self_topo_procs_nb
: Number of processes monitored by scaph -
scaph_self_socket_stats_nb{socket_id="SOCKET_ID"}
: Number of CPUStat traces stored for each socket -
scaph_self_socket_records_nb{socket_id="SOCKET_ID"}
: Number of energy consumption Records stored for each socket, with SOCKET_ID being the id of the socket measured -
scaph_self_domain_records_nb{socket_id="SOCKET_ID",rapl_domain_name="RAPL_DOMAIN_NAME "}
: Number of energy consumption Records stored for a Domain, where SOCKET_ID identifies the socket and RAPL_DOMAIN_NAME identifies the rapl domain measured on that socket
scaph_process_power_consumption_microwatts
Here are available labels for the scaph_process_power_consumption_microwatts
metric that you may need to extract the data you need:
exe
: is the name of the executable that is the origin of that process. This is good to be used when your application is running one or only a few processes.cmdline
: this contains the whole command line with the executable path and its parameters (concatenated). You can filter on this label by using prometheus=~
operator to match a regular expression pattern. This is very practical in many situations.instance
: this is a prometheus generated label to enable you to filter the metrics by the originating host. This is very useful when you monitor distributed services, so that you can not only sum the metrics for the same service on the different hosts but also see what instance of that service is consuming the most, or notice differences beteween hosts that may not have the same hardware, and so on...pid
: is the process id, which is useful if you want to track a specific process and have your eyes on what's happening on the host, but not so practical to use in a more general use case
Qemu exporter
Computes energy consumption metrics for each Qemu/KVM virtual machine found on the host. Exposes those metrics as filetrees compatible with the powercap_rapl sensor.
Note that this is still experimental. Metrics are already considered trustworthy, but there are discussions and tests to be performed about the acceptable ways to share the data with the guests/vms. Any feedback or thoughts about this are welcome. Please refer to the contributing section.
Usage
-
Run the scaphandre with the qemu exporter on your bare metal hypervisor machine:
scaphandre qemu # this is suitable for a test, please run it as a systemd service for a production setup
-
Default is to expose virtual machines metrics in
/var/lib/libvirt/scaphandre/${DOMAIN_NAME}
withDOMAIN_NAME
being the libvirt domain name of the virtual machine. First create a tmpfs mount point to isolate metrics for that virtual machine:mount -t tmpfs tmpfs_DOMAIN_NAME /var/lib/libvirt/scaphandre/DOMAIN_NAME -o size=10m
-
Ensure you expose the content of this folder to the virtual machine by having this configuration in the xml configuration of the domain:
<filesystem type='mount' accessmode='passthrough'> <driver type='virtiofs'/> <source dir='/var/lib/libvirt/scaphandre/DOMAIN_NAME'/> <target dir='scaphandre'/> <readonly /> </filesystem>
You can edit the vm properties using
sudo virsh edit <DOMAIN_NAME>
using your usual editor. But it is more convenient to use virtual-manager, as explained in the following screenshots.It also helps to define the correct syntax which probably depends from the qemu version. You can check that the above configuration is slightly different form the one below.
a. Right click in the hardware menu:
b. Enter the following parameters:
c. XML generated as a result:
-
Ensure the VM has been started once the configuration is applied, then mount the filesystem on the VM/guest:
mount -t 9p -o trans=virtio scaphandre /var/scaphandre
-
Still in the guest, run scaphandre in VM mode with the default sensor:
scaphandre --vm prometheus
-
Collect your virtual machine specific power usage metrics. (requesting http://VM_IP:8080/metrics in this example, using the prometheus exporter)
-
Riemann exporter
Usage
You can launch the Riemann exporter this way (running the default powercap_rapl sensor):
scaphandre riemann
As always exporter's options can be displayed with -h
:
scaphandre-riemann
Riemann exporter sends power consumption metrics to a Riemann server
USAGE:
scaphandre riemann [FLAGS] [OPTIONS]
FLAGS:
-h, --help Prints help information
-q, --qemu Instruct that scaphandre is running on an hypervisor
-V, --version Prints version information
OPTIONS:
-a, --address <address> Riemann ipv6 or ipv4 address [default: localhost]
-d, --dispatch <dispatch_duration> Duration between metrics dispatch [default: 5]
-p, --port <port> Riemann TCP port number [default: 5555]
With default options values, the metrics are sent to http://localhost:5555 every 5 seconds
Use -q or --qemu option if you are running scaphandre on a hypervisor. In that case a label with the vm name will be added to all qemu-system*
processes.
This will allow to easily create charts consumption for each vm and defined which one is the top contributor.
Metrics exposed
Typically the Riemann exporter is working in the same way as the prometheus exporter regarding metrics. Please look at details in Prometheus exporter documentations.
There is only one exception about process_power_consumption_microwatts
each process has a service name process_power_consumption_microwatts_pid_exe
.
As an example, process consumption can be retrieved using the following Riemann query:
(service =~ "process_power_consumption_microwatts_%_firefox") or (service =~ "process_power_consumption_microwatts_%_scaphandre")
Stdout exporter
Usage
You can launch the stdout exporter this way (running the default powercap_rapl sensor):
scaphandre stdout
Default behavior is to measure and show metrics periodically during 10 seconds. You can change that timeout with -t
. Here is how to display metrics during one minute:
scaphandre stdout -t 60
You can change as well the step measure duration with -s. Here is how to display metrics during one minutes with a 5s step:
scaphandre stdout -t 60 -s 5
As always exporter's options can be displayed with -h
:
$ scaphandre stdout -h
scaphandre-stdout
Stdout exporter allows you to output the power consumption data in the terminal.
USAGE:
scaphandre stdout [OPTIONS]
FLAGS:
-h, --help Prints help information
-V, --version Prints version information
OPTIONS:
-s, --step <step_duration> Set measurement step duration in seconds. [default: 2]
-t, --timeout <timeout> Maximum time spent measuring, in seconds. [default: 10]
JSON exporter
Usage
You can launch the JSON exporter this way (running the default powercap_rapl sensor):
scaphandre json
Default behavior is to measure and show metrics periodically during 10 seconds. You can change that timeout with -t
. Here is how to display metrics during one minute:
scaphandre json -t 60
You can change as well the step measure duration with -s. Here is how to display metrics during one minutes with a 5s step:
scaphandre json -t 60 -s 5
If you want a faster interval you can use option -n (for nano seconds). Here is how to display metrics during 10s with a 100ms step:
scaphandre json -t 10 -s 0 -n 100000000
By default, JSON is printed in the terminal, to write result in a file you can provide a path with option -f:
scaphandre json -t 10 -s 0 -n 100000000 -f report.json
As always exporter's options can be displayed with -h
:
$ scaphandre json -h
scaphandre-json
JSON exporter allows you to output the power consumption data in a json file
USAGE:
scaphandre json [OPTIONS]
FLAGS:
-h, --help Prints help information
-V, --version Prints version information
OPTIONS:
-f, --file <file_path> Destination file for the report. [default: ]
-s, --step <step_duration> Set measurement step duration in second. [default: 2]
-n, --step_nano <step_duration_nano> Set measurement step duration in nano second. [default: 0]
-t, --timeout <timeout> Maximum time spent measuring, in seconds. [default: 10]
Powercap_rapl sensor
Pre-requesites
At the time those lines are written, this sensor works only on:
- OS: GNU/Linux
- Intel and AMD x86 CPUs, produced after 2012 (or some laptop cpu prior to 2012)
It needs the following kernel modules to be present and running:
On kernels 5.0 or later: intel_rapl_common
On kernel prior 5.0: intel_rapl
For AMD processors, it seems that powercap/rapl will work only since kernel 5.8 and 5.11 for family 19h.
Energy consumption data can be directly collected on a physical machine only.
To collect energy consumption on a virtual machine, you may first collect power consumption data from the hypervisor thanks to the qemu exporter and then collect those metrics in the virtual machine thanks to this sensor, with --vm
flag enabled.
Usage
To explicitely call the powercap_rapl sensor from the command line use:
scaphandre -s powercap_rapl EXPORTER # EXPORTER being the exporter name you want to use
You can see arguments available from the cli for this sensors with:
scaphandre -s powercap_rapl -h
If running in a virtual machine:
scaphandre --vm -s powercap_rapl EXPORTER
Please refer to doc.rs code documentation for more details.
Options available
sensor-buffer-per-socket-max-kB
: Maximum memory size allowed, in KiloBytes, for storing energy consumption for each socketsensor-buffer-per-domain-max-kB
: Maximum memory size allowed, in KiloBytes, for storing energy consumption for each domain
Environment variables
If in --vm
mode, you want to read metrics from another path than the default /var/scaphandre
, set env var SCAPHANDRE_POWERCAP_PATH
with the desired path.
Troubleshooting
When running scaphandre on Ubuntu 20.xx I get a permission denied
error
Since linux kernel package 5.4.0-53.59 in debian/ubuntu, powercap attributes are only accessible by root:
linux (5.4.0-53.59) focal; urgency=medium
* CVE-2020-8694
- powercap: make attributes only readable by root
Therefor, the user running scaphandre needs to have read access to energy_uj files in /sys/class/powercap
.
You can run the init.sh script to apply appropriate permissions to the required files.
Why ?
We are facing the biggest challenge of mankind history: to make our activities sustainable enough to not jeopardize our future.
To match the Paris agreement objective of not going further than +2 Celcius degrees on average in 2100, we need to transform our industries. The tech industry is no exception and we need to make sure that its benefits are not cancelled (or worse) by its negative impact.
Measuring power consumption in tech services infrastructures is not as easy as it seems. We often rely on physical devices to do that and need to build in-house data pipelines to get those metrics useful. Furthermore this doesn't give fine graned data regarding applications/processes power consumption but only the host power consumption. This is even harder in a virtualization environement or as public cloud customers.
Even recent scientific researches about tech energy consumption rely on statistical, both large and narrowed scale, assumptions based data (because it's all we got). Despite their weaknesses, they also all agree on the fact that the current consumption of datacenters and the Internet will increase drastically in the following years.
Scaphandre aims to initiate a collaboration of tech companies and enthusiasts to provide an easy, lightweight, robust and well understood way to precisely measure the energy consumption and make it useful to take soberness decisions.
Note that the greenhouse gazes emissions related to this energy consumption depends on the energetic mix of your country. You may find valuable data about that on Electricity Map.
Compatibility
Scaphandre intends to provide multiple ways to gather power consumption metrics and make understanding tech services footprint possible in many situations. Depending on how you use scaph, you may have some restrictions.
To summarize, scaphandre should provide two ways to estimate the power consumption of a service, process or machine. Either by measuring it, using software interfaces that give access to hardware metrics, or by estimating it if measuring is not an option (this is a planned feature, not yet implemented as those lines are written, in december 2020).
In scaphandre, the code responsible to collect the power consumption data before any further processing is grouped in components called sensors. If you want more details about scaphandre structure, here are the explanations.
The PowercapRAPL sensor enables you to measure the power consumption, it is the most precise solution, but it doesn't work in all contexts. A future sensor is to be developped to support other use cases. Here is the current state of scaphandre's compatibility:
Sensor | x86 bare metal | ARM bare metal | Virtual Machine | Public cloud instance | Container |
---|---|---|---|---|---|
PowercapRAPL | Yes | We don't know yet | Yes, if on a qemu/KVM hypervisor that runs scaphandre and the Qemu exporter | No, until your cloud provider uses scaphandre on its hypervisors | Depends on what you want |
Future estimation based sensor | Future Yes | Future Yes | Future Yes | Future Yes | Future Yes |
Troubleshooting
I get a permission denied error when I run scaphandre, no matter what is the exporter
On some Linux distributions (ubuntu 20.04 for sure), the energy counters files that the PowercapRAPL sensor uses, are owned by root. (since late 2020)
To ensure this is your issue and fix that quickly you can run the init.sh script:
bash init.sh
Then run scaphandre. If it does not work, the issue is somewhere else.
I get a no such device error, the intel_rapl of intel_rapl_common kernel modules are present
It can mean that your cpu doesn't support RAPL. Please refer to the compatibility section to be sure.
I can't mount the required kernel modules, getting a Could'nt find XXX modules
error
If you are in a situation comparable to this one, you may need to install additional packages.
On ubuntu 20.01 and 20.10, try to install linux-modules-extra-$(uname-r)
with apt. Then you should be able to modprobe intel_rapl_common
.
Contributing guide
If you are reading this, you may be to contribute. Just for that, a big thank you ! 👏
Feel free to propose pull requests, or open new discussions or issues at will. Scaphandre is a collaborative project and all opinions and propositions shall be heard and studied. The contributions will be received with kindness, gratitude and with an open mind. Remember that we are all dwarfs standing on the shoulders of giants. We all have to learn from others and to give back, with due mutual respect.
Code of conduct
This project adheres to the Rust Code of Conduct, which can be found here.
Ways to contribute
Contributions may take multiple forms:
- 💻 code, of course, but not only (there is a lot more !)
- 📖 documentation
- 🎤 Any help on communication: writing blog posts, speaking about scaphandre in conferences, speaking and writing about the responsibility of tech to be sustainable as well !
- 🧬 structuring the project and the community is also a very important topic. Feel free to propose help, or start discussions about that.
This project intends to unite a lot of people to have a lot of positive impact. Any action going helping us to get there will be very much appreciated ! 🎉
Contact
Discussions and questions about the project are welcome on gitter or by email.
Contribution guidelines
This project intends to use conventionnal commit messages and the gitflow workflow.
Scaphandre is a not only a tool, but a framework. Modules dedicated to collect energy comsumption data from the host are called Sensors. Modules that are dedicated to send this data to a given channel or remote system are called Exporters. New Sensors and Exporters are going to be created and all contributions are welcome. For more on the internal structure please jump here.
Additional references for documentation
- /proc/stat explained
- Gathering CPU utilization from /proc/stat
- proc filesystem documentation
- CPU usage on Linux
- Using RAPL to read PP0 and DRAM energy on Haswell
- RAPL reference
- How to measure linux performance avoiding most typical mistakes: CPU
- How to calculate cpu utilization
Powercap/RAPL source code in the kernel
- arch/x86/events/intel/rapl.c
- drivers/powercap