Prometheus exporter for monitoring Storj storage nodes

anclrii, updated πŸ•₯ 2023-02-09 12:06:21

Build GitHub release (latest SemVer) Stand With Ukraine

Stand With Ukraine

About

Storj exporter for Prometheus written in python. It pulls information from storj node api for node, satellite and payout metrics.

Also check out Storj-Exporter-dashboard for Grafana to visualise metrics for multiple storj nodes.

Tested with storj node versions listed under tests/api_mock/

0x187C8C43890fe4C91aFabbC62128D383A90548Dd

Usage

  • Exporter can be installed as a docker container or a systemd service or a standalone script
  • Make sure you have -p 127.0.0.1:14002:14002 in storagenode container docker run command to allow local connections to storj node api
  • storagenode is the default value for STORJ_HOST_ADDRESS environment variable that sets the address of the storage node container used to link exporter to the api
  • If you storagenode container has a different name it needs to be set with both --link=<storagenode-name-here> and -e STORJ_HOST_ADDRESS=<storagenode-name-here> on the docker command

Installation

Docker installation

Run latest build from DockerHub (easiest option, assuming storagenode is the name of the storagenode container)
docker run -d --link=storagenode --name=storj-exporter -p 9651:9651 -e STORJ_HOST_ADDRESS=storagenode anclrii/storj-exporter:latest

Docker image supports linux/386,linux/amd64,linux/arm/v6,linux/arm/v7,linux/arm64 platforms.

Run multiple instances of exporter to monitor multiple storagenodes running on the same host

In this example storagenode1, storagenode2, storagenode3 are the names of storagenode containers runnin on the same host. The docker commands would be:

docker run -d --link=storagenode1 --name=storj-exporter1 -p 9651:9651 -e STORJ_HOST_ADDRESS=storagenode1 anclrii/storj-exporter:latest
docker run -d --link=storagenode2 --name=storj-exporter2 -p 9652:9651 -e STORJ_HOST_ADDRESS=storagenode2 anclrii/storj-exporter:latest
docker run -d --link=storagenode3 --name=storj-exporter3 -p 9653:9651 -e STORJ_HOST_ADDRESS=storagenode3 anclrii/storj-exporter:latest

Systemd service installation

Create storj-exporter user for service
useradd --no-create-home --shell /bin/false storj_exporter
Install package dependencies
Dependencies: python3 python3-pip
pip3 install --no-cache-dir -r /requirements.txt
Move storj_exporter directory to a desired location
mv storj_exporter/ /opt/
chown storj_exporter:storj_exporter /opt/storj_exporter/
Install systemd service and set to start on boot
cp storj_exporter.service /etc/systemd/system/
systemctl daemon-reload
systemctl restart storj_exporter
systemctl enable storj_exporter
Standalone script
python3 /path/to/storj_exporter/

Installing full monitoring stack (Prometheus + Grafana + Dashboard)

You can find some installation notes and guides in dashboard README, also see quick-start guide to set up the whole stack using docker-compose.

Variables

Following environment variables are available:

| Variable name | Description | Docker default | Standalone default | | --- | --- | --- | --- | | STORJ_HOST_ADDRESS | Address of the storage node | storagenode | 127.0.0.1 | | STORJ_API_PORT | Storage node api port | 14002 | 14002 | | STORJ_EXPORTER_PORT | A port that exporter opens to expose metrics on | 9651 | 9651 | | STORJ_COLLECTORS | A list of collectors | payout sat | payout sat |

Collectors

By default exporter collects node, payout and satellite data from api. Satellite data is particularly expensive on cpu resources and disabling it might be useful on smaller systems

Netdata

For users that use Netdata: Netdata by default has a prometheus plugin enabled, which pulls all the data from the exporter every 5 seconds. This results in high CPU spikes on the storagenode. It is therefore advisable to disable the prometheus plugin of Netdata: cd /etc/netdata sudo ./edit-config go.d.conf Then under "modules:" uncomment "prometheus" and change its value to "no": ``` modules:

activemq: yes

[...]

powerdns_recursor: yes

prometheus: no After that restart the netdata service: sudo systemctl restart netdata ```

Issues

Read timeout errors. Server crashing.

opened on 2022-11-29 01:17:39 by MattJE9601

My logs are full of warnings like this:

[WARNING]: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPConnectionPool(host='192.168.1.121', port=14001): Read timed out. (read timeout=10)")': /api/sno/

Even though I'm getting this it still pulls data successfully, but if I leave the exporter running (especially for many modes) it crashes my whole server. Is there a way to tune the timeout? Or any other suggestions now to fix this?

DayIgress, DayEgress an UptoDate-State

opened on 2022-07-10 13:05:43 by marcelw94

Hallo, Unfortunately, the following values ​​are missing (DayIgress, DayEgress an UptoDate-State), can you integrate them again?

Error loading: yesoreyeram-boomtable-panel

opened on 2021-06-17 04:01:51 by daschmidt1994

Since the update from Grafana the Dashboard shows this error. Error loading: yesoreyeram-boomtable-panel

Add a Prometheus AlertManager example

opened on 2021-01-06 11:20:41 by anclrii

Might also not be a terrible idea to look into including Prometheus AlertManager into this setup so that people can get email notifications on certain events such as long time since last ping or audit failure levels reaching a certain level. Saves them having to be keeping an eye on the dashboard all the time or if they are away for a while and possibly catching a problem before it becomes a major issue such as node disqualification. If there is enough interest and once we have everything mapped into the exporter as possible and any other pressing additions or improvements, I can start looking into that. I will likely end up implementing it myself anyway but depending on interest can re-prioritize as needed.

Originally posted by @Cmdrd in https://github.com/anclrii/Storj-Exporter/issues/11#issuecomment-578614339

Implement code documentation

opened on 2020-01-27 16:50:36 by Cmdrd

While the code base is relatively small, it has taken some time to get up to speed with the functionality, especially with not being as familiar with the provided Prometheus Python library metric functions.

Putting this issue in as a working thread for further documentation discussions.

Currently have a local branch for adding documentation but definitely throw suggestions in here. Will get a PR done for an initial evaluation.

Consider limiting systemd service auto-restarts

opened on 2020-01-26 10:39:02 by anclrii

Consider increasing RestartSec=5s or at least adding a limit like StartLimitBurst=3

Releases

2023-02-09 12:05:02

2.1.2 2022-06-24 14:09:22

What's Changed

  • Adding currentMonthExpectations metric by @anclrii in https://github.com/anclrii/Storj-Exporter/pull/69

Full Changelog: https://github.com/anclrii/Storj-Exporter/compare/2.1.1...2.1.2

2.1.1 2022-06-13 20:01:50

What's Changed

  • Adding logging and catching some exceptions by @anclrii in https://github.com/anclrii/Storj-Exporter/pull/66

Full Changelog: https://github.com/anclrii/Storj-Exporter/compare/2.1.0...2.1.1

2022-06-12 10:12:14

2022-06-10 16:15:31

2021-03-27 16:41:42

storj-exporter docker docker-container prometheus