Page MenuHomeVyOS Platform

VRF specification is needed for telegraf prometheus-client listen-address <address>
Closed, ResolvedPublicFEATURE REQUEST

Description

When we specify listen-address for
set service monitoring telegraf prometheus-client listen-address <address>
we should be able to specify vrf as well. As the OAM interface is usually placed into separate vrf for security reasons. This could be like
set service monitoring telegraf prometheus-client vrf <vrf-name>
if possible.
Thank you,
Alex

Details

Difficulty level
Easy (less than an hour)
Version
-
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Perfectly compatible
Issue type
Improvement (missing useful functionality)

Event Timeline

As we have one config file for all plugins, as we start only one telegraf process, I guess it should be global telegraf option set service monitoring telegraf vrf <vrf-name>

I tried to add vrf, but it requires some permissions, service is not starting

diff --git a/data/templates/monitoring/override.conf.j2 b/data/templates/monitoring/override.conf.j2
index 9f1b4ebe..63e479af 100644
--- a/data/templates/monitoring/override.conf.j2
+++ b/data/templates/monitoring/override.conf.j2
@@ -1,7 +1,10 @@
+{% set vrf_command = 'ip vrf exec ' ~ vrf ~ ' ' if vrf is vyos_defined else '' %}
 [Unit]
 After=vyos-router.service
 ConditionPathExists=/run/telegraf/vyos-telegraf.conf
 [Service]
+ExecStart=
+ExecStart={{ vrf_command }}/usr/bin/telegraf -config /run/telegraf/vyos-telegraf.conf -config-directory /etc/telegraf/telegraf.d $TELEGRAF_OPTS
 Environment=INFLUX_TOKEN={{ influxdb.authentication.token }}
 CapabilityBoundingSet=CAP_NET_RAW CAP_NET_ADMIN CAP_SYS_ADMIN
 AmbientCapabilities=CAP_NET_RAW CAP_NET_ADMIN
diff --git a/interface-definitions/service-monitoring-telegraf.xml.in b/interface-definitions/service-monitoring-telegraf.xml.in
index 36f40a53..dc014ee1 100644
--- a/interface-definitions/service-monitoring-telegraf.xml.in
+++ b/interface-definitions/service-monitoring-telegraf.xml.in
@@ -306,6 +306,7 @@
                   </leafNode>
                 </children>
               </node>
+              #include <include/interface/vrf.xml.i>
             </children>
           </node>
         </children>

Service status

vyos@r1# systemctl status vyos-telegraf.service
● vyos-telegraf.service - The plugin-driven server agent for reporting metrics into InfluxDB
     Loaded: loaded (/etc/systemd/system/vyos-telegraf.service; disabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/vyos-telegraf.service.d
             └─10-override.conf
     Active: failed (Result: exit-code) since Tue 2022-08-16 16:37:05 EEST; 1s ago
       Docs: https://github.com/influxdata/telegraf
    Process: 10453 ExecStart=ip vrf exec foo /usr/bin/telegraf -config /run/telegraf/vyos-telegraf.conf -config-directory /etc/telegraf/telegraf.d $TELEGRAF_OPTS (code=exited, status=255/EXCEPTION)
   Main PID: 10453 (code=exited, status=255/EXCEPTION)
        CPU: 2ms

Aug 16 16:37:04 r1 systemd[1]: vyos-telegraf.service: Main process exited, code=exited, status=255/EXCEPTION
Aug 16 16:37:04 r1 systemd[1]: vyos-telegraf.service: Failed with result 'exit-code'.
Aug 16 16:37:05 r1 systemd[1]: vyos-telegraf.service: Scheduled restart job, restart counter is at 5.
Aug 16 16:37:05 r1 systemd[1]: Stopped The plugin-driven server agent for reporting metrics into InfluxDB.
Aug 16 16:37:05 r1 systemd[1]: vyos-telegraf.service: Start request repeated too quickly.
Aug 16 16:37:05 r1 systemd[1]: vyos-telegraf.service: Failed with result 'exit-code'.
Aug 16 16:37:05 r1 systemd[1]: Failed to start The plugin-driven server agent for reporting metrics into InfluxDB.
[edit]
vyos@r1#

Log:

Aug 16 16:38:21 r1 sudo[10470]:     vyos : TTY=pts/0 ; PWD=/home/vyos ; USER=root ; COMMAND=/usr/bin/systemctl restart vyos-telegraf
Aug 16 16:38:21 r1 sudo[10470]: pam_unix(sudo:session): session opened for user root(uid=0) by vyos(uid=1003)
Aug 16 16:38:21 r1 systemd[1]: Started The plugin-driven server agent for reporting metrics into InfluxDB.
Aug 16 16:38:21 r1 ip[10473]: mkdir failed for /sys/fs/cgroup/system.slice/vyos-telegraf.service/vrf/foo: Permission denied
Aug 16 16:38:21 r1 ip[10473]: Failed to setup vrf cgroup2 directory
Aug 16 16:38:21 r1 systemd[1]: vyos-telegraf.service: Main process exited, code=exited, status=255/EXCEPTION
Aug 16 16:38:21 r1 systemd[1]: vyos-telegraf.service: Failed with result 'exit-code'.

Manual start of telegraf works for me

root@vyos-lns-1:/etc/systemd/system# ip vrf exec oam /usr/bin/telegraf --debug -config /run/telegraf/vyos-telegraf.conf -config-directory /etc/telegraf/telegraf.d
2022-08-16T16:29:51Z I! : Plugin "outputs.prometheus_client" deprecated since version and will be removed in :
2022-08-16T16:29:51Z I! : Plugin "inputs.cpu" deprecated since version and will be removed in :
2022-08-16T16:29:51Z I! : Plugin "inputs.mem" deprecated since version and will be removed in :
2022-08-16T16:29:51Z I! : Plugin "inputs.linux_sysctl_fs" deprecated since version and will be removed in :
2022-08-16T16:29:51Z I! : Plugin "inputs.ntpq" deprecated since version and will be removed in :
2022-08-16T16:29:51Z I! : Plugin "inputs.net" deprecated since version and will be removed in :
2022-08-16T16:29:51Z I! : Plugin "inputs.kernel" deprecated since version and will be removed in :
2022-08-16T16:29:51Z I! : Plugin "inputs.interrupts" deprecated since version and will be removed in :
2022-08-16T16:29:51Z I! : Plugin "inputs.conntrack" deprecated since version and will be removed in :
2022-08-16T16:29:51Z I! : Plugin "inputs.nstat" deprecated since version and will be removed in :
2022-08-16T16:29:51Z I! : Plugin "inputs.disk" deprecated since version and will be removed in :
2022-08-16T16:29:51Z I! : Plugin "inputs.system" deprecated since version and will be removed in :
2022-08-16T16:29:51Z I! : Plugin "inputs.processes" deprecated since version and will be removed in :
2022-08-16T16:29:51Z I! : Plugin "inputs.ethtool" deprecated since version and will be removed in :
2022-08-16T16:29:51Z I! : Plugin "inputs.internal" deprecated since version and will be removed in :
2022-08-16T16:29:51Z I! : Plugin "inputs.syslog" deprecated since version and will be removed in :
2022-08-16T16:29:51Z I! : Plugin "inputs.diskio" deprecated since version and will be removed in :
2022-08-16T16:29:51Z I! : Plugin "inputs.netstat" deprecated since version and will be removed in :
2022-08-16T16:29:51Z I! : Plugin "inputs.systemd_units" deprecated since version and will be removed in :
2022-08-16T16:29:51Z I! Starting Telegraf 1.23.1
2022-08-16T16:29:51Z I! Loaded inputs: conntrack cpu disk diskio ethtool internal interrupts kernel linux_sysctl_fs mem net netstat nstat ntpq processes syslog system systemd_units
2022-08-16T16:29:51Z I! Loaded aggregators:
2022-08-16T16:29:51Z I! Loaded processors:
2022-08-16T16:29:51Z I! Loaded outputs: prometheus_client
2022-08-16T16:29:51Z I! Tags enabled: host=vyos-lns-1
2022-08-16T16:29:51Z I! [agent] Config: Interval:15s, Quiet:false, Hostname:"vyos-lns-1", Flush Interval:15s
2022-08-16T16:29:51Z D! [agent] Initializing plugins
2022-08-16T16:29:51Z D! [agent] Connecting outputs
2022-08-16T16:29:51Z D! [agent] Attempting connection to [outputs.prometheus_client]
2022-08-16T16:29:51Z I! [outputs.prometheus_client] Listening on http://[::]:9273/metrics
2022-08-16T16:29:51Z D! [agent] Successfully connected to outputs.prometheus_client
2022-08-16T16:29:51Z D! [agent] Starting service inputs
2022-08-16T16:30:06Z D! [outputs.prometheus_client] Wrote batch of 247 metrics in 2.110746ms
2022-08-16T16:30:06Z D! [outputs.prometheus_client] Buffer fullness: 0 / 10000 metrics

but in that case it does not create /sys/fs/cgroup/system.slice/vyos-telegraf.service/vrf/oam

The only way to start telegraf with ip vrf exec i found - is to comment out
#User=telegraf
in /etc/systemd/system/vyos-telegraf.service and
chown root:root /run/telegraf

not a good solution running telegraf as root, but i can move further with telegraf any way.

Try to add some capabilities, for example, CAP_CHOWN or CAP_DAC_OVERRIDE or something else

sudo nano /etc/systemd/system/vyos-telegraf.service.d/10-override.conf

https://github.com/vyos/vyos-1x/blob/1f880973e221b91ac843a27d2e4c0b3de1880b97/data/templates/monitoring/override.conf.j2#L6


CapabilityBoundingSet=CAP_NET_RAW CAP_NET_ADMIN CAP_SYS_ADMIN CAP_DAC_OVERRIDE CAP_CHOWN CAP_LEASE

Nothing helps

Aug 19 14:13:50 ip[4307]: mkdir failed for /sys/fs/cgroup/system.slice/vyos-telegraf.service/vrf: Permission denied
Aug 19 14:13:50 ip[4307]: Failed to setup vrf cgroup2 directory

c-po changed the task status from Open to In progress.Aug 25 2022, 4:52 PM
c-po claimed this task.
c-po changed Difficulty level from Unknown (require assessment) to Easy (less than an hour).
c-po changed Is it a breaking change? from Unspecified (possibly destroys the router) to Perfectly compatible.
c-po changed Issue type from Unspecified (please specify) to Improvement (missing useful functionality).

It seems working:

● telegraf.service - The plugin-driven server agent for reporting metrics into InfluxDB
     Loaded: loaded (/lib/systemd/system/telegraf.service; disabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/telegraf.service.d
             └─10-override.conf
     Active: active (running) since Mon 2022-08-29 12:51:47 EEST; 1min 7s ago
       Docs: https://github.com/influxdata/telegraf
   Main PID: 6740 (telegraf)
      Tasks: 9 (limit: 9409)
     Memory: 49.7M
        CPU: 836ms
     CGroup: /system.slice/telegraf.service
             └─vrf
               └─foo
                 └─6740 /usr/bin/telegraf --config /run/telegraf/telegraf.conf --config-directory /etc/telegraf/telegraf.d --pidfile /run/telegraf/telegraf.pid

Aug 29 12:51:48 r14 ip[6740]: 2022-08-29T09:51:48Z I! : Plugin "inputs.disk" deprecated since version  and will be removed in :
Aug 29 12:51:48 r14 ip[6740]: 2022-08-29T09:51:48Z I! : Plugin "inputs.net" deprecated since version  and will be removed in :
Aug 29 12:51:48 r14 ip[6740]: 2022-08-29T09:51:48Z I! Starting Telegraf 1.23.1
Aug 29 12:51:48 r14 ip[6740]: 2022-08-29T09:51:48Z I! Loaded inputs: conntrack cpu disk diskio ethtool internal interrupts kernel linux_sysctl_fs mem net netstat nstat ntpq processes syslog system systemd_units
Aug 29 12:51:48 r14 ip[6740]: 2022-08-29T09:51:48Z I! Loaded aggregators:
Aug 29 12:51:48 r14 ip[6740]: 2022-08-29T09:51:48Z I! Loaded processors:
Aug 29 12:51:48 r14 ip[6740]: 2022-08-29T09:51:48Z I! Loaded outputs: prometheus_client
Aug 29 12:51:48 r14 ip[6740]: 2022-08-29T09:51:48Z I! Tags enabled: host=r14
Aug 29 12:51:48 r14 ip[6740]: 2022-08-29T09:51:48Z I! [agent] Config: Interval:15s, Quiet:false, Hostname:"r14", Flush Interval:15s
Aug 29 12:51:48 r14 ip[6740]: 2022-08-29T09:51:48Z I! [outputs.prometheus_client] Listening on http://[::]:9273/metrics

I'd suggest adding

**Restart=always
RestartSec=10**

to /usr/share/vyos/templates/telegraf/override.conf.j2 as it is done for ntp.service.
Otherwise the telegraf service do not start - it does 5 start attempts very quickly during boot with error:

Sep 07 11:43:59 vyos-lns-1 systemd[1]: telegraf.service: Failed with result 'exit-code'.
Sep 07 11:43:59 vyos-lns-1 systemd[1]: telegraf.service: Scheduled restart job, restart counter is at 5.
Sep 07 11:43:59 vyos-lns-1 systemd[1]: telegraf.service: Start request repeated too quickly.
Sep 07 11:43:59 vyos-lns-1 systemd[1]: telegraf.service: Failed with result 'exit-code'.

and stays in a failed state.
see boot log attached.