Currently, there is a high cost to use python for the validators. The python interpreter must be forked at each test and this is not cheap:
vyos@vyos# time bash -c "" real 0m0.001s user 0m0.001s sys 0m0.000s [edit] vyos@vyos# time python3 -c "" real 0m0.019s user 0m0.011s sys 0m0.007s [edit] vyos@vyos# time python3 -c "import vyos.ifconfig" real 0m0.173s user 0m0.104s sys 0m0.031s
So the bash code is 100-200x time faster than the python per call. However, even that code is not as fast as some compiled C would be as it forks and exec some other application.
Python performance is mostly "good enough" for most cases when the interpreter is running. If the python code is running in long-lived processes, then the initialisation impact of python becomes a moot point.
For example, using Unix socket, and a simple example from https://pymotw.com/3/socket/uds.html, (a simple echo program) gives the following results:
vyos@vyos:~$ time bash -c 'echo "test" | nc -U uds_socket | head -1' test real 0m0.004s user 0m0.003s sys 0m0.000s
This includes forking shell, nc and head and using echo, even so the performance is near bash.
So why would we want to use a long-lived python process over using C/OCaml/Bash for validation/
1 - it will keep the number of languages down in the project. Python is a widely known language.
2 - it will allow using the same validation code between the config / operational and validation code allowing consistency
3 - the C code could be adapted to use the Unix pipe, saving an expensive fork
4 - long lived code will let to other optimisation
5 - the framework set will also be available for configuration of operational mode, where the tools are more complex and better written in python
For the optimisation(4), for exmple, templating is currently be used to generate the configuration file for thrid party application. Like for fork, the inialisation is high and having long-lived program would improve performances.
I believe (5) is, however, the most compelling point, as the long term direction of the project should be considered. Currently, the XML is used to generate some files, then used by some C code ... It would make sense to have the XML being used by the same python code used for the configuration. Once all the logic is moved within the Python, this becomes possible. Also, possibly removing the need to even run as a daemon, as no forking will be required for anything and the initialisation cost may be acceptable. Some other feature also become available but this becomes off-topic (not performance-related).
The project already uses multiple languages and not all contributors are fluent in them all. I can count Perl, XML, Shell, C, Python, OCaml (and surely a few DSL). Python is likely to be the most known programming language by likely contributors.
Having all the code under python also open other options such as using entry-points to generate single applications for each of the validation.
I propose to use this ticket to:
- discuss the pro and cons of all the approach
- share numbers and performance about the different solutions for objective decision making
- but leave out the other thing possible and instead use T2407
Some part of discussion already occurred in: