I'm experimenting with Vyos before swapping over from a Unifi USG and in doing so I am wanting to use terraform to codify changes. By default terraform will perform several things in parallel and as I have been adding more code blocks in the terraform provider, i've started to notice frequently I am getting 502 Bad Gateway responses from the http api. To try to reproduce this outside of terraform I am using a node program called "autocannon" to allow making concurrent requests.
When running a single connection it all seems to be fine:
```
$ npx autocannon --duration 10 --connections 1 --renderStatusCodes --method POST --form '{"data": {"type":"text", "value": "{\"op\": \"showConfig\", \"path\": []}"}, "key": {"type":"text","value":"my-secret-key"}}' 'https://vyos:8443/retrieve'
Running 10s test @ https://vyos:8443/retrieve
1 connections
┌─────────┬────────┬────────┬────────┬────────┬───────────┬──────────┬────────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼────────┼────────┼────────┼────────┼───────────┼──────────┼────────┤
│ Latency │ 545 ms │ 564 ms │ 603 ms │ 603 ms │ 563.48 ms │ 12.98 ms │ 603 ms │
└─────────┴────────┴────────┴────────┴────────┴───────────┴──────────┴────────┘
┌───────────┬──────┬──────┬───────┬───────┬─────────┬─────────┬──────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼──────┼──────┼───────┼───────┼─────────┼─────────┼──────┤
│ Req/Sec │ 1 │ 1 │ 2 │ 2 │ 1.7 │ 0.46 │ 1 │
├───────────┼──────┼──────┼───────┼───────┼─────────┼─────────┼──────┤
│ Bytes/Sec │ 8 kB │ 8 kB │ 16 kB │ 16 kB │ 13.6 kB │ 3.67 kB │ 8 kB │
└───────────┴──────┴──────┴───────┴───────┴─────────┴─────────┴──────┘
┌──────┬───────┐
│ Code │ Count │
├──────┼───────┤
│ 200 │ 17 │
└──────┴───────┘
Req/Bytes counts sampled once per second.
# of samples: 10
18 requests in 10.03s, 136 kB read
```
Bumping that up to just 2 concurrent connections I start getting errors.
```
$ npx autocannon --duration 10 --connections 2 --renderStatusCodes --method POST --form '{"data": {"type":"text", "value": "{\"op\": \"showConfig\", \"path\": []}"}, "key": {"type":"text","value":"my-secret-key"}}' 'https://vyos:8443/retrieve'
Running 10s test @ https://vyos:8443/retrieve
2 connections
┌─────────┬──────┬───────┬───────┬────────┬──────────┬──────────┬────────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼──────┼───────┼───────┼────────┼──────────┼──────────┼────────┤
│ Latency │ 4 ms │ 47 ms │ 98 ms │ 104 ms │ 48.08 ms │ 27.71 ms │ 216 ms │
└─────────┴──────┴───────┴───────┴────────┴──────────┴──────────┴────────┘
┌───────────┬────────┬────────┬────────┬────────┬────────┬─────────┬────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼────────┼────────┼────────┼────────┼────────┼─────────┼────────┤
│ Req/Sec │ 333 │ 333 │ 572 │ 638 │ 557.6 │ 79.96 │ 333 │
├───────────┼────────┼────────┼────────┼────────┼────────┼─────────┼────────┤
│ Bytes/Sec │ 109 kB │ 109 kB │ 180 kB │ 200 kB │ 176 kB │ 23.7 kB │ 109 kB │
└───────────┴────────┴────────┴────────┴────────┴────────┴─────────┴────────┘
┌──────┬───────┐
│ Code │ Count │
├──────┼───────┤
│ 200 │ 2 │
├──────┼───────┤
│ 502 │ 5574 │
└──────┴───────┘
Req/Bytes counts sampled once per second.
# of samples: 10
2 2xx responses, 5574 non 2xx responses
6k requests in 10.02s, 1.76 MB read
```
Sometimes I'll get a handful of successful ones and other times I'll get all failures when using this tool.
In the `monitor log` I also see the http api is crashing/segfaulting:
```
Feb 12 22:40:15 kernel: vyos-http-api-s[3490]: segfault at f1f47738 ip 00007fccf226dde3 sp 00007fccf1c89d60 error 6 in libvyosconfig.so.0[7fccf21aa000+10f000] likely on CPU 3 (core 3, socket 0)
Feb 12 22:40:15 kernel: Code: 8b 5f 18 48 c7 03 03 00 00 00 48 8b 5f 10 48 83 fb 01 74 4f 49 83 ef 38 4d 3b 7e 08 0f 82 8c 0e 00 00 49 8d 5f 08 48 83 c3 28 <48> c7 43 f8 0b 04 00 00 48 89 03 48 8d 7b e8 48 c7 47 f8 17 08 00
Feb 12 22:40:15 systemd[1]: vyos-http-api.service: Main process exited, code=killed, status=11/SEGV
```
As I run autocannon more, it is odd because its not always the same failure :\ As a developer, I know these are the worst kind of bugs :(
Here are some of the various errors I see when running the concurrent requests.
```
Feb 13 22:40:31 vyos-http-api[59419]: double free or corruption (out)
Feb 13 22:40:31 systemd[1]: vyos-http-api.service: Main process exited, code=killed, status=6/ABRT
```
```
Feb 13 22:43:09 kernel: traps: vyos-http-api-s[60009] general protection fault ip:7fcd2df94061 sp:7fcd2f1154f8 error:0 in libvyosconfig.so.0[7fcd2ded7000+10f000]
Feb 13 22:43:09 systemd[1]: vyos-http-api.service: Main process exited, code=killed, status=11/SEGV
```
```
Feb 13 22:43:43 kernel: vyos-http-api-s[60088]: segfault at 7f41c01bed56 ip 00007f41c02b3ef4 sp 00007f41c0bed4a8 error 7 in libvyosconfig.so.0[7f41c01b0000+10f000] likely on CPU 1 (core 1, socket 0)
Feb 13 22:43:43 kernel: Code: 00 00 00 0f 1f 00 48 8d 47 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f0 48 d1 f8 89 c6 85 c0 0f 88 f0 00 00 00 48 8b 42 28 <48> c7 42 38 ff ff ff ff 48 89 42 20 48 89 42 30 48 8b 0f 48 63 c6
Feb 13 22:43:43 systemd[1]: vyos-http-api.service: Main process exited, code=killed, status=11/SEGV
```
```
Feb 13 22:44:20 kernel: vyos-http-api-s[60112]: segfault at bf8 ip 00007f56f129a4dd sp 00007f56f1c23d30 error 4
Feb 13 22:44:20 kernel: vyos-http-api-s[60111]: segfault at 4d ip 00007f56f123c339 sp 00007f56f2424b80 error 4
Feb 13 22:44:20 kernel: in libvyosconfig.so.0[7f56f11e6000+10f000]
Feb 13 22:44:20 kernel: in libvyosconfig.so.0[7f56f11e6000+10f000] likely on CPU 2 (core 2, socket 0)
Feb 13 22:44:20 kernel: likely on CPU 3 (core 3, socket 0)
Feb 13 22:44:20 kernel:
Feb 13 22:44:20 kernel: Code: b8 01 00 00 00 48 83 c4 28 c3 0f 1f 00 e9 3b f1 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 48 83 ec 28 48 89 04 24 48 89 5c 24 08 <48> 8b 7b f8 48 c1 ef 0a 48 8d 3c fd ff ff ff ff 48 0f b6 1c 3b 48
Feb 13 22:44:20 kernel: Code: 18 48 89 78 20 48 83 c4 08 c3 e8 92 d1 fa ff eb c6 48 83 ec 48 48 8b 40 10 48 89 44 24 18 48 8b 40 20 48 8b 58 20 48 8b 5b 20 <48> 8b 7b 20 48 8b 77 20 48 89 74 24 10 48 8b 56 20 48 89 54 24 20
Feb 13 22:44:20 systemd[1]: vyos-http-api.service: Main process exited, code=killed, status=11/SEGV
```
```
Feb 13 22:45:05 kernel: vyos-http-api-s[60174]: segfault at 0 ip 00007f26b2ecc8dd sp 0000000000000000 error 4 in libvyosconfig.so.0[7f26b2dbf000+10f000] likely on CPU 1 (core 1, socket 0)
Feb 13 22:45:05 kernel: Code: 83 c4 08 48 83 c4 08 41 5f 41 5e 41 5d 41 5c 5d 5b c3 48 83 c8 02 eb cf 90 49 f7 86 e8 00 00 00 01 00 00 00 75 09 49 8b 66 10 <41> 8f 46 10 c3 49 89 c4 48 89 c7 5e 48 89 e2 49 8b 4e 10 e8 3b 78
Feb 13 22:45:05 kernel: vyos-http-api-s[60173]: segfault at f ip 00007f26b2e5d7b6 sp 00007f26b3ffde20 error 4 in libvyosconfig.so.0[7f26b2dbf000+10f000] likely on CPU 0 (core 0, socket 0)
Feb 13 22:45:05 kernel: Code: b8 01 00 00 00 48 83 c4 08 c3 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec 18 48 89 c7 48 83 fb 01 74 57 48 89 1c 24 48 89 7c 24 08 <48> 8b 03 48 8b 37 48 89 fb ff d6 48 89 44 24 10 48 8b 04 24 48 8b
Feb 13 22:45:05 systemd[1]: vyos-http-api.service: Main process exited, code=killed, status=11/SEGV
```
Previously I was running an image from June last year when I initially was starting to test out Vyos before getting too busy. I tried upgrading to the latest version and still getting the errors.
```
1: 1.4-rolling-202302130317 (default boot) (running image)
2: 1.4-rolling-202206250934
```
Hardware wise, I'm running on an inexpensive Intel N5105 from AliExpress running 16GB RAM, which should be more than sufficient for this, so I doubt it's a hardware/resources issue.
I've started to strip away pieces of the config to see if anything I had added was the cause for it, but so far no luck. Here is my config:
```
container {
network services {
prefix 10.1.0.0/24
}
}
firewall {
all-ping enable
}
interfaces {
ethernet eth0 {
address dhcp
description WAN
hw-id 60:be:b4:02:92:5f
}
ethernet eth1 {
address 10.10.1.1/24
description MANAGEMENT
hw-id 60:be:b4:02:29:db
vif 10 {
address 192.168.32.1/21
description SERVERS
}
vif 20 {
address 10.10.21.1/24
description TRUSTED
}
vif 30 {
address 192.168.2.1/24
description IOT
}
vif 40 {
address 10.10.40.1/24
description VIDEO
}
vif 50 {
address 192.168.50.1/24
description GUEST
}
}
ethernet eth2 {
address dhcp
description WAN2
hw-id 60:be:b4:02:29:dc
}
ethernet eth3 {
address dhcp
hw-id 60:be:b4:02:29:dd
}
loopback lo {
}
}
nat {
source {
rule 1 {
destination {
address 0.0.0.0/0
}
outbound-interface eth0
translation {
address masquerade
}
}
}
}
protocols {
}
service {
https {
api {
debug
keys {
id terraform {
key ****************
}
}
}
virtual-host default {
listen-address 0.0.0.0
listen-port 8443
}
}
ntp {
allow-client {
address 0.0.0.0/0
address ::/0
}
server time1.vyos.net {
}
server time2.vyos.net {
}
server time3.vyos.net {
}
}
ssh {
port 22
}
}
system {
config-management {
commit-revisions 100
}
conntrack {
modules {
ftp
h323
nfs
pptp
sip
sqlnet
tftp
}
}
console {
device ttyS0 {
speed 115200
}
}
domain-name local
host-name gateway
login {
user vyos {
authentication {
encrypted-password ****************
plaintext-password ****************
public-keys brandencash@parchedmac {
key ****************
type ssh-rsa
}
public-keys personal {
key ****************
type ssh-ed25519
}
}
}
}
name-server 1.1.1.1
syslog {
global {
facility all {
level info
}
facility protocols {
level debug
}
}
}
time-zone America/Phoenix
}
```
Any idea what would be going on here?