Create Prometheus Exporter for VyOS
Needs testing, WishlistPublicFEATURE REQUEST
Actions

Assigned To

Authored By

	dongjunbo
	Nov 7 2018, 1:50 AM

Description

Prometheus already have a bunch of exporters here
https://prometheus.io/docs/instrumenting/exporters/

We need to create one for VyOS

Details

Version: -
Is it a breaking change?: Unspecified (possibly destroys the router)
Issue type: Feature (new functionality)

Related Objects

Mentioned In: T6949: blackbox_exporter for probing endpoints
rVYOSONEX3f933f1642de: T973: add basic frr_exporter implementation (#4150)
rVYOSONEX1601ede1ecc1: Debian: T973: add missing dependency on node-exporter package
rVYOSONEXed873d845372: Merge pull request #4130 from c-po/node-exporter-fix
rVYOSONEX1749c3a99b88: T973: remove irrelevant standard values
rVYOSONEXa0c15a159e54: T973: add basic node_exporter implementation
rVYOSONEXa175bd6518cc: Merge pull request #4048 from rebortg/node_exporter

Event Timeline

dongjunbo created this task.Nov 7 2018, 1:50 AM

syncer triaged this task as Wishlist priority.Nov 7 2018, 7:46 AM

syncer removed projects: VyOS 2.0.x, VyOS 1.2 Crux.

Raeven subscribed.Dec 15 2018, 4:19 PM

albeu subscribed.Feb 18 2019, 12:00 PM

pasik subscribed.Mar 12 2019, 6:08 PM

You can scrap SNMP to prometheus. Not sure if you want any gauges not covered by snmp

https://wiki.vyos.net/wiki/SNMP

install snmp-exporter
/etc/prometheus/prometheus.yml :
...
scrape_configs:
  - job_name: 'snmp'
    static_configs:
    - targets:
      - 10.11.22.33
    metrics_path: /snmp
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 127.0.0.1:9116  # The SNMP exporter's real hostname:port.
...

SNMP can be used as a workaround, but it is not suitable for much more than a couple metrics because is it very inefficient. Moreover the prometheus node exporter provide many more metrics out of the box.

srosiak subscribed.Feb 27 2020, 9:53 AM

syncer renamed this task from prometheus support to Create Prometheus Exporter for VyOS .Mar 1 2020, 9:45 PM

syncer assigned this task to kroy.

syncer updated the task description. (Show Details)

syncer set Is it a breaking change? to Unspecified (possibly destroys the router).

avanier subscribed.Sep 27 2020, 4:21 PM

Yeah, I'm going to second the motion on this one. I've been down the road of SNMP before, and just navigating the available metrics and trying to figure out what they are is less than straightforward. I mean, I've been at this sysadmin thing for a bit over 10 years now and I couldn't manage something useable out of it. It might be because I'm not a great sysadmin, but a lot of people are not great sysadmins. 😛 Having this feature would be great for accessibility and discoverability.

If we were to turn this into a user story, I believe it could look like this:

As a VyOS administrator,
I want to be able to scrape Prometheus metrics off of the router,
So that I can monitor the health of the router

I think it could also be broken down into the following tasks:

List out the metrics currently exposed by SNMP
Prune-out metrics that are irrelevant or technically incompatible with Prometheus from the SNMP list
Build, write or package a Prometheus exporter that would suit the metrics we mean to export
Add to VyOS-Build a step to build and insert the Prometheus exporter
Provide start, stop, and restart scripts to manage the Prometheus service
Provide a thingy to template the Prometheus configuration
Add a configuration item to manage the Prometheus service

From my experience, none of those tasks are particularly complex, it's just a bit of a list to process... and it's certainly not getting done overnight.

Other points of interest / opinion:

The exporter should probably be written in Go if we want to make it easy to write with the rest of the Prom frameworks and we want to make sure we can compile to all the target platforms supported by VyOS.
We should avoid having a constellation of exporters, but favour having a single one. I feel like starting and stopping those would be pretty icky.
Something, something, configuration of the interface on which the service will listen on.
Something, something, security and authentication.

Viacheslav subscribed.Oct 16 2020, 6:27 PM

syncer reassigned this task from kroy to superq.Oct 16 2020, 6:27 PM

syncer added a subscriber: kroy.

Quite interesting, support, in fact some information can not be captured from SNMP very well

ps: Please note that prometheus needs to depend on the compilation environment of go

We should avoid having a constellation of exporters, but favour having a single one. I feel like starting and stopping those would be pretty icky.

Handling multiple services is pretty easy with systemd. Having a super-exporter is an anti-pattern for Prometheus.

Most of the the metrics user would want out of VyOS are available in the node_exporter.

Also, starting with SNMP data doesn't seem to make a lot of sense. Reproducing SNMP is a bit of a non-goal in my opinion. Prometheus users are going to expect Prometheus-native data.

https://github.com/tynany/frr_exporter

To prevent forgetting, write the address of the exporter to task

@jack9603301 Do you know of a version of that FRR exporter that doesn't fork sub processes?

Do you know of a version of that FRR exporter that doesn't fork sub processes?

Please forgive me for not understanding what you mean

The frr_exporter linked uses os/exec to run an external binray, /usr/bin/vtysh. This is not a great way to build an exporter, as it can lead to a fork bomb. There is also the overhead of calling the external binary to gather data.

The frr_exporter linked uses os/exec to run an external binray, /usr/bin/vtysh. This is not a great way to build an exporter, as it can lead to a fork bomb. There is also the overhead of calling the external binary to gather data.

Just let it be started by systemed administration

No, that's not the problem. The exporter itself could potentially create thousands of sub processes if something were to go wrong.

There is a large amount of overhead in this process. Looking over some of the issues in the ffr_exporter's bug tracker shows it's pretty slow and problematic.

Most of Prometheus data is generated from the exporter. It is not collected and pushed in real time. When Prometheus queries, it can query relevant indications through the port exposed by the exporter. Therefore, I don't think it is possible to create thousands of sub processes/threads. What do you think?

I'm not sure you understand how this works.

Prometheus is a polling-based metrics collection system. When it scrapes the exporter, the exporter has to return the data.

The way the exporter works is that it uses exec to launch the external command to gather the data. This happens in real time for each scrape.

Because Prometheus is polling, and multiple Prometheus servers or humans can hit the /metrics endpoint, the collector must allow for concurrent scrapes.

So each concurrent scrape will fork new sub processes. And looking at the code, the exporter actually calls the sub process multiple times per scrape.

If those processes get stuck, they could build up, even if there are timeouts passed via context handling.

I think I understand what you mean. Don't worry. I'm also a user of Prometheus. I know how Prometheus works.

I think the feasible solution is to set the timeout

Once the time limit is exceeded, a sigkill signal is sent

Timeouts and SIGKILL don't always work. If process is stuck on IO, it will not exit.

Forking is _very_ bad, especially for an embedded router OS. Until this is fixed, I would _highly_ recommend against including this exporter. It is too dangerous to use.

It is true, but I just want to record it to avoid forgetting that another solution is to redevelop FRR and promote it in parallel with the official version of FRR (in other words, we can patch FRR or maintain a branch separately, then compile a version of our own, and get the indication directly from its code, but this work needs someone to do.)

It is more efficient to obtain monitoring data directly from service internal than external plug-in

If you can, make a patch, and then the automatic compilation script will automatically include the patch into the FRR source tree when compiling.)

The best possible solution would be for FRR to support Prometheus directly, rather than require an exporter.

I agree. Therefore, if someone understands the code structure of FRR, we can modify the implementation from within FRR according to Prometheus protocol framework, implement the exporter integration, and then generate a patch file. Set the automatic compilation script and automatically package it into DEB

https://git.freestone.net/cramer/frr-prometheus-stats

Hi, guys, I found an interesting script in frrouter's github repo. In fact, this is purely because someone wrote a script and submitted the following bug report:

https://github.com/FRRouting/frr/issues/5445

The address of this prometheus exporter script is as follows:

https://git.freestone.net/cramer/frr-prometheus-stats/-/raw/master/frr-prometheus-stats.py

Maybe some reference

This means that maybe we can set up our own exporter based on python3

Does anyone follow up on this?

erkin set Issue type to Feature (new functionality).Sep 1 2021, 10:50 AM

syncer edited projects, added VyOS 1.3 Equuleus (1.3.0); removed VyOS 1.3 Equuleus.Nov 6 2021, 11:25 AM

Alexey.Kirillov subscribed.Dec 1 2021, 7:29 AM

Does anyone at least have an example of how to use the snmp exporter? For example a snmp.yml or generate one with the given mibs?

I do agree having an exporter would be really nice

@anthr76 we have ready telegraf exporter, maybe it will work for you?
https://docs.vyos.io/en/latest/configuration/service/monitoring.html

Prometheus-client already in 1.4
https://docs.vyos.io/en/latest/configuration/service/monitoring.html#prometheus-client

I wouldn't call telegraf a very good option. It does a very bad job of producing Prometheus metrics.

In T973#124168, @superq wrote:

I wouldn't call telegraf a very good option. It does a very bad job of producing Prometheus metrics.

@superq Are there any know issues?

syncer edited projects, added VyOS 1.3 Equuleus (1.3.3); removed VyOS 1.3 Equuleus (1.3.0).Aug 29 2022, 7:06 AM

tioan subscribed.Oct 16 2022, 11:50 AM

@Viacheslav I want to test this, what should be done?

In T973#137840, @elico wrote:

@Viacheslav I want to test this, what should be done?

nothing special, the configuration described in our documentation
Prometheus Client exposes all metrics on /metrics (default) to be polled by a Prometheus server

egoistdream subscribed.Nov 30 2022, 10:50 PM

syncer edited projects, added VyOS 1.3 Equuleus (1.3.4); removed VyOS 1.3 Equuleus (1.3.3).Jul 12 2023, 9:45 PM

syncer edited projects, added VyOS 1.3 Equuleus (1.3.5); removed VyOS 1.3 Equuleus (1.3.4).Aug 25 2023, 9:31 PM

syncer edited projects, added VyOS 1.3 Equuleus (1.3.6); removed VyOS 1.3 Equuleus (1.3.5).Dec 17 2023, 11:38 PM

Viacheslav edited projects, added VyOS 1.5 Circinus; removed VyOS 1.3 Equuleus (1.3.6).Feb 2 2024, 4:44 PM

dmbaturin removed superq as the assignee of this task.Jul 2 2024, 7:05 PM

dmbaturin added a subscriber: superq.

Restricted Repository Identity mentioned this in rVYOSONEXa175bd6518cc: Merge pull request #4048 from rebortg/node_exporter.Oct 4 2024, 11:43 AM

Restricted Repository Identity mentioned this in rVYOSONEX1749c3a99b88: T973: remove irrelevant standard values.Oct 4 2024, 11:43 AM

Restricted Repository Identity mentioned this in rVYOSONEXa0c15a159e54: T973: add basic node_exporter implementation.

c-po changed the task status from Open to Needs testing.Oct 5 2024, 8:07 AM

c-po assigned this task to rob.