Page MenuHomeVyOS Platform

Script daemon to offload processing during commit
Closed, ResolvedPublic



There is one obvious source of overhead during the running of conf_mode scripts, contributing to non trivial delays in commit, hence boot. Each script initializes a Config class, which consists in a call to the backend to retrieve the active and session configs; this is expensive. In addition, each conf_mode script incurs the start-up cost of the Python interpreter.


  1. During the commit phase, the configuration state is fixed: active and session config are unchanging, so the initialization of each script is not fetching 'new' information.
  1. Other than that initial config information, the execution of conf_mode scripts does not require the session in any other manner. Thus, with the provided configs, conf_mode scripts may be run 'off-line'.

Approach (initial version complete; link below):

  1. a script daemon, running as a zmq server, that accepts requests to run a specific conf_mode script; the request consists of the script name and, if applicable, a VYOS_TAGNODE_VALUE.
  1. a shim zmq client, in C, that is called _within_ the node definition file; if 'on', it dispatches the arguments (script name, tagnode?); 'off', a no-op, and the execution of the script proceeds as normal. On initialization, it provides the active and session configs to the daemon.
  1. a variant of the Config class ( with constructor taking arguments: active_config_string, session_config_string. This provides enough for conf_mode scripts to run outside of a config session, with reference to a single Config instance, as long as they adhere to a few requirements, detailed below.

A design goal is that this is as unobtrusive as possible; no existing code has been changed. For conf_mode scripts to participate, they will add a few lines (also below); those lines are a no-op if shim is off.

Requirements on conf_mode scripts:

In order to be dispatch-able, a conf_mode script needs

  1. adherence to the proper use of get_config/verify/generate/apply as outlined in the coding guidelines:

  1. not use the following functions from in_session, session_changed, show_config, is_tag, is_multi, is_leaf --- none of these should be ever needed; the one instance of 'is_tag' I find in current scripts is removable.

3) not have a hyphen in the script name, so that it may be imported as module. [No longer required]

Required lines for off-loading:

The example case is; diff is here:

< def get_config():
> def get_config(config=None):
>     if config:
>         conf = config
>     else:
>         conf = Config()
<     conf = Config()

To test:

One will need the vyos-1x and vyatta-cfg branch below (a few lines in vyatta-cfg signal the beginning of commit for initialization); one will need to install 'libzmq3-dev' in your docker image as a build requirement for vyos-1x.

There is one annoying manual intervention required: since the shim reference can not be added to _all_ node.def's at this point, one will need to do the following for the test case:

> /opt/vyatta/share/vyatta-cfg/templates/interfaces/dummy/node.def
> < end: sudo sh -c "VYOS_TAGNODE_VALUE='$VAR(@)' ${vyos_conf_scripts_dir}/"
> ---
>> end: sudo sh -c "${vyshim} VYOS_TAGNODE_VALUE='$VAR(@)' ${vyos_conf_scripts_dir}/"

On the running system, 'sudo systemctl start vyos-configd'; in the shell from which you launch config, 'export vyshim=/usr/sbin/vyshim'; config; add, say, 100 dummy interfaces; commit.

time 100 dummy interfaces:

with daemon and shim:

real 0m7.174s
user 0m1.165s
sys 0m0.750s

without daemon and shim:

real 0m28.267s
user 0m14.436s
sys 0m5.345s

Code is here; note this includes the patch for T2568, but will be rebased when that is pushed.

This is the first version, and many details will evolve quickly, but the logic is well demonstrated by the current example.


Difficulty level
Unknown (require assessment)
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Perfectly compatible
Issue type
Internal change (not visible to end users)

Event Timeline

jestabro changed the task status from Open to In progress.Jun 11 2020, 2:59 AM
jestabro triaged this task as Normal priority.
jestabro created this task.
jestabro created this object in space S1 VyOS Public.

Thank you very much for the POC. Very useful to understand the proposed design.

I would appreciate some insight into what would happen if two users were to commit at the same time. Would it cause an issue?

Looking at the code, I am curious to if there is a particular reasons for loading the configuration in the shim and sending a request to the daemon instead and having the daemon loading it?

Would it also be possible to only have one Config class by passing the configuration content as them as parameter to __init__?

It may also be possible to not have to modify the conf_mode code: It is my understanding that currently the conf_mode code assumes that it is run from the root level of the configuration.
The caller of the get_config could reset the Configuration level to the root before the call, removing the requirement for any change in the current conf code.

And the class VyOSError may need sharing :-)

Finally I was wondering if VYOS_TAGNODE_VALUE should be passed to the shim using the ENV which may remove the need for a regex ..

  1. commit restrictions/permissions have not changed; all handled in the backend.
  2. The daemon can not load the configs, as it does not have/need access to the config session; that's the point.
  3. ?
  4. the conf_mode script needs to reference the global config; re-setting level is basic hygiene --- the script should ask itself 'what if I am called again?'
  5. we want to eat the args in the node.def; so you already have it.

There is another use of is_tag/is_leaf in python/vyos/ is_member, as it can work on both bridge and bond members, and they have different syntax for member interfaces. It would only be possible to hardcode each case and remove the use of is_*

I propose to provide alternatives is_ functions using the new XML code. I will provide a patch for review.

The work to perform the get_config/verify/generate/apply in python is already available in this tree:

The code can be lift and shifted in a daemon with minimal changes.

jestabro changed the task status from In progress to Needs testing.EditedJul 29 2020, 4:27 PM
jestabro added a subscriber: c-po.

What is here:

vyos-configd: a zmq server accepting requests to run specified conf_mode scripts outside of the config session, in batch. For example, if requested from a config session, all queued scripts will be run under a single Python interpreter and a single instance of Config; the speed-up of commit time is notable.

vyshim: a shim vyos-configd client, inserted into the config session, to request off-loading of conf_mode scripts during the commit phase.

conf_mode scripts are opted-in for inclusion in off-loading; if not opted, the shim will pass-through execution of the script to the config session. In case of catastrophic failure (zmq failure; interrupt signal not handled by configd service; etc.), as detected by timeout of communication, execution returns to pass-through.

Current timings:

[N.B. commit times have risen recently, compared to the original numbers in the description of this task; this is due to the practiced use of config_dict and config_diff --- a practice which has allowed @c-po to dramatically reduce code redundency, and improve stability and maintainability. Running under the config daemon shows negligible increase, as those costs are amortized.]

Adding 100 dummy interfaces:

with no daemon running (hence shim var is empty; business as usual):

[email protected]# time commit

real 0m37.382s
user 0m18.176s
sys 0m7.208s

with daemon running:

[email protected]# time commit

real 0m8.027s
user 0m1.307s
sys 0m0.766s

with daemon, but script run as pass-through:

[email protected]# time commit

real 0m37.631s
user 0m18.304s
sys 0m7.131s

catastrophic failure: client times-out, runs script as pass-through

[email protected]# time commit

real 0m38.944s
user 0m18.638s
sys 0m7.360s

The current list of scripts running under daemon below; the remaining (but
for those not able, or not ready; cf. T2649) are being added, and the list
will be updated.

... edit: all scripts have been added, except for those not yet ready (T2649); those that should not be included, since they need to modify config; those that are currently being revised (, the list of those excluded:

['', '', '', '', '', '', '', '', '', '']

Note that the daemon service is stopped for smoketests under qemu, pending
investigation of a regression, and better integration with qemu. smoketests will run fine on the installed image, with daemon enabled (default) --- remember to provide >= 2 nics
to support tests.
(smoketests now in vyos-1x T2832)

erkin set Issue type to Internal change (not visible to end users).Aug 30 2021, 5:34 AM
erkin removed a subscriber: Active contributors.