I'm experimenting with Vyos before swapping over from a Unifi USG and in doing so I am wanting to use terraform to codify changes. By default terraform will perform several things in parallel and as I have been adding more code blocks in the terraform provider, i've started to notice frequently I am getting 502 Bad Gateway responses from the http api. To try to reproduce this outside of terraform I am using a node program called "autocannon" to allow making concurrent requests.
When running a single connection it all seems to be fine:
$ npx autocannon --duration 10 --connections 1 --renderStatusCodes --method POST --form '{"data": {"type":"text", "value": "{\"op\": \"showConfig\", \"path\": []}"}, "key": {"type":"text","value":"my-secret-key"}}' 'https://vyos:8443/retrieve' Running 10s test @ https://vyos:8443/retrieve 1 connections ┌─────────┬────────┬────────┬────────┬────────┬───────────┬──────────┬────────┐ │ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │ ├─────────┼────────┼────────┼────────┼────────┼───────────┼──────────┼────────┤ │ Latency │ 545 ms │ 564 ms │ 603 ms │ 603 ms │ 563.48 ms │ 12.98 ms │ 603 ms │ └─────────┴────────┴────────┴────────┴────────┴───────────┴──────────┴────────┘ ┌───────────┬──────┬──────┬───────┬───────┬─────────┬─────────┬──────┐ │ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │ ├───────────┼──────┼──────┼───────┼───────┼─────────┼─────────┼──────┤ │ Req/Sec │ 1 │ 1 │ 2 │ 2 │ 1.7 │ 0.46 │ 1 │ ├───────────┼──────┼──────┼───────┼───────┼─────────┼─────────┼──────┤ │ Bytes/Sec │ 8 kB │ 8 kB │ 16 kB │ 16 kB │ 13.6 kB │ 3.67 kB │ 8 kB │ └───────────┴──────┴──────┴───────┴───────┴─────────┴─────────┴──────┘ ┌──────┬───────┐ │ Code │ Count │ ├──────┼───────┤ │ 200 │ 17 │ └──────┴───────┘ Req/Bytes counts sampled once per second. # of samples: 10 18 requests in 10.03s, 136 kB read
Bumping that up to just 2 concurrent connections I start getting errors.
$ npx autocannon --duration 10 --connections 2 --renderStatusCodes --method POST --form '{"data": {"type":"text", "value": "{\"op\": \"showConfig\", \"path\": []}"}, "key": {"type":"text","value":"my-secret-key"}}' 'https://vyos:8443/retrieve' Running 10s test @ https://vyos:8443/retrieve 2 connections ┌─────────┬──────┬───────┬───────┬────────┬──────────┬──────────┬────────┐ │ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │ ├─────────┼──────┼───────┼───────┼────────┼──────────┼──────────┼────────┤ │ Latency │ 4 ms │ 47 ms │ 98 ms │ 104 ms │ 48.08 ms │ 27.71 ms │ 216 ms │ └─────────┴──────┴───────┴───────┴────────┴──────────┴──────────┴────────┘ ┌───────────┬────────┬────────┬────────┬────────┬────────┬─────────┬────────┐ │ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │ ├───────────┼────────┼────────┼────────┼────────┼────────┼─────────┼────────┤ │ Req/Sec │ 333 │ 333 │ 572 │ 638 │ 557.6 │ 79.96 │ 333 │ ├───────────┼────────┼────────┼────────┼────────┼────────┼─────────┼────────┤ │ Bytes/Sec │ 109 kB │ 109 kB │ 180 kB │ 200 kB │ 176 kB │ 23.7 kB │ 109 kB │ └───────────┴────────┴────────┴────────┴────────┴────────┴─────────┴────────┘ ┌──────┬───────┐ │ Code │ Count │ ├──────┼───────┤ │ 200 │ 2 │ ├──────┼───────┤ │ 502 │ 5574 │ └──────┴───────┘ Req/Bytes counts sampled once per second. # of samples: 10 2 2xx responses, 5574 non 2xx responses 6k requests in 10.02s, 1.76 MB read
Sometimes I'll get a handful of successful ones and other times I'll get all failures when using this tool.
In the monitor log I also see the http api is crashing/segfaulting:
Feb 12 22:40:15 kernel: vyos-http-api-s[3490]: segfault at f1f47738 ip 00007fccf226dde3 sp 00007fccf1c89d60 error 6 in libvyosconfig.so.0[7fccf21aa000+10f000] likely on CPU 3 (core 3, socket 0) Feb 12 22:40:15 kernel: Code: 8b 5f 18 48 c7 03 03 00 00 00 48 8b 5f 10 48 83 fb 01 74 4f 49 83 ef 38 4d 3b 7e 08 0f 82 8c 0e 00 00 49 8d 5f 08 48 83 c3 28 <48> c7 43 f8 0b 04 00 00 48 89 03 48 8d 7b e8 48 c7 47 f8 17 08 00 Feb 12 22:40:15 systemd[1]: vyos-http-api.service: Main process exited, code=killed, status=11/SEGV
As I run autocannon more, it is odd because its not always the same failure :\ As a developer, I know these are the worst kind of bugs :(
Here are some of the various errors I see when running the concurrent requests (each block was from one run of autocannon).
Feb 13 22:40:31 vyos-http-api[59419]: double free or corruption (out) Feb 13 22:40:31 systemd[1]: vyos-http-api.service: Main process exited, code=killed, status=6/ABRT
Feb 13 22:43:09 kernel: traps: vyos-http-api-s[60009] general protection fault ip:7fcd2df94061 sp:7fcd2f1154f8 error:0 in libvyosconfig.so.0[7fcd2ded7000+10f000] Feb 13 22:43:09 systemd[1]: vyos-http-api.service: Main process exited, code=killed, status=11/SEGV
Feb 13 22:43:43 kernel: vyos-http-api-s[60088]: segfault at 7f41c01bed56 ip 00007f41c02b3ef4 sp 00007f41c0bed4a8 error 7 in libvyosconfig.so.0[7f41c01b0000+10f000] likely on CPU 1 (core 1, socket 0) Feb 13 22:43:43 kernel: Code: 00 00 00 0f 1f 00 48 8d 47 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f0 48 d1 f8 89 c6 85 c0 0f 88 f0 00 00 00 48 8b 42 28 <48> c7 42 38 ff ff ff ff 48 89 42 20 48 89 42 30 48 8b 0f 48 63 c6 Feb 13 22:43:43 systemd[1]: vyos-http-api.service: Main process exited, code=killed, status=11/SEGV
Feb 13 22:44:20 kernel: vyos-http-api-s[60112]: segfault at bf8 ip 00007f56f129a4dd sp 00007f56f1c23d30 error 4 Feb 13 22:44:20 kernel: vyos-http-api-s[60111]: segfault at 4d ip 00007f56f123c339 sp 00007f56f2424b80 error 4 Feb 13 22:44:20 kernel: in libvyosconfig.so.0[7f56f11e6000+10f000] Feb 13 22:44:20 kernel: in libvyosconfig.so.0[7f56f11e6000+10f000] likely on CPU 2 (core 2, socket 0) Feb 13 22:44:20 kernel: likely on CPU 3 (core 3, socket 0) Feb 13 22:44:20 kernel: Feb 13 22:44:20 kernel: Code: b8 01 00 00 00 48 83 c4 28 c3 0f 1f 00 e9 3b f1 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 48 83 ec 28 48 89 04 24 48 89 5c 24 08 <48> 8b 7b f8 48 c1 ef 0a 48 8d 3c fd ff ff ff ff 48 0f b6 1c 3b 48 Feb 13 22:44:20 kernel: Code: 18 48 89 78 20 48 83 c4 08 c3 e8 92 d1 fa ff eb c6 48 83 ec 48 48 8b 40 10 48 89 44 24 18 48 8b 40 20 48 8b 58 20 48 8b 5b 20 <48> 8b 7b 20 48 8b 77 20 48 89 74 24 10 48 8b 56 20 48 89 54 24 20 Feb 13 22:44:20 systemd[1]: vyos-http-api.service: Main process exited, code=killed, status=11/SEGV
Feb 13 22:45:05 kernel: vyos-http-api-s[60174]: segfault at 0 ip 00007f26b2ecc8dd sp 0000000000000000 error 4 in libvyosconfig.so.0[7f26b2dbf000+10f000] likely on CPU 1 (core 1, socket 0) Feb 13 22:45:05 kernel: Code: 83 c4 08 48 83 c4 08 41 5f 41 5e 41 5d 41 5c 5d 5b c3 48 83 c8 02 eb cf 90 49 f7 86 e8 00 00 00 01 00 00 00 75 09 49 8b 66 10 <41> 8f 46 10 c3 49 89 c4 48 89 c7 5e 48 89 e2 49 8b 4e 10 e8 3b 78 Feb 13 22:45:05 kernel: vyos-http-api-s[60173]: segfault at f ip 00007f26b2e5d7b6 sp 00007f26b3ffde20 error 4 in libvyosconfig.so.0[7f26b2dbf000+10f000] likely on CPU 0 (core 0, socket 0) Feb 13 22:45:05 kernel: Code: b8 01 00 00 00 48 83 c4 08 c3 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec 18 48 89 c7 48 83 fb 01 74 57 48 89 1c 24 48 89 7c 24 08 <48> 8b 03 48 8b 37 48 89 fb ff d6 48 89 44 24 10 48 8b 04 24 48 8b Feb 13 22:45:05 systemd[1]: vyos-http-api.service: Main process exited, code=killed, status=11/SEGV
Previously I was running an image from June last year when I initially was starting to test out Vyos before getting too busy. I tried upgrading to the latest version and still getting the errors.
1: 1.4-rolling-202302130317 (default boot) (running image) 2: 1.4-rolling-202206250934
Hardware wise, I'm running on an inexpensive Intel N5105 from AliExpress running 16GB RAM, which should be more than sufficient for this, so I doubt it's a hardware/resources issue.
I've started to strip away pieces of the config to see if anything I had added was the cause for it, but so far no luck. Here is my config:
container { network services { prefix 10.1.0.0/24 } } firewall { all-ping enable } interfaces { ethernet eth0 { address dhcp description WAN hw-id 60:be:b4:02:92:5f } ethernet eth1 { address 10.10.1.1/24 description MANAGEMENT hw-id 60:be:b4:02:29:db vif 10 { address 192.168.32.1/21 description SERVERS } vif 20 { address 10.10.21.1/24 description TRUSTED } vif 30 { address 192.168.2.1/24 description IOT } vif 40 { address 10.10.40.1/24 description VIDEO } vif 50 { address 192.168.50.1/24 description GUEST } } ethernet eth2 { address dhcp description WAN2 hw-id 60:be:b4:02:29:dc } ethernet eth3 { address dhcp hw-id 60:be:b4:02:29:dd } loopback lo { } } nat { source { rule 1 { destination { address 0.0.0.0/0 } outbound-interface eth0 translation { address masquerade } } } } protocols { } service { https { api { debug keys { id terraform { key **************** } } } virtual-host default { listen-address 0.0.0.0 listen-port 8443 } } ntp { allow-client { address 0.0.0.0/0 address ::/0 } server time1.vyos.net { } server time2.vyos.net { } server time3.vyos.net { } } ssh { port 22 } } system { config-management { commit-revisions 100 } conntrack { modules { ftp h323 nfs pptp sip sqlnet tftp } } console { device ttyS0 { speed 115200 } } domain-name local host-name gateway login { user vyos { authentication { encrypted-password **************** plaintext-password **************** public-keys brandencash@parchedmac { key **************** type ssh-rsa } public-keys personal { key **************** type ssh-ed25519 } } } } name-server 1.1.1.1 syslog { global { facility all { level info } facility protocols { level debug } } } time-zone America/Phoenix }
Any idea what would be going on here?