Page MenuHomeVyOS Platform

Conntrack-sync Internal Cache Growing Uncontrollably
Open, NormalPublicBUG

Description

For the past few months, conntrack-sync on my routers has been growing to an absurd size (even though the conntrack table itself is relatively small), at which point I begin to see intermittent connectivity issues until I restart the service via restart conntrack-sync. This lasts a few days until I have to do the same thing again. Here's what the a side router looks like when comparing conntrack-sync's internal cache vs the actual conntrack table; you'll notice there's a huge discrepancy (284 active connections in the conntrack table, but nearly 22,000 in the cache), and this is 20 minutes after restarting the service:

trae@cr01a-vyos:~$ show conntrack table ipv4 | wc -l
286
trae@cr01a-vyos:~$ show conntrack statistics 
                CPU         Found         Invalid          Insert    Insert fail    Drop       Early drop        Errors       Search restart
-----  -------  ----------  ------------  ---------------  --------  -------------  ---------  ----------------  -----------  ----------------
cpu=0  found=0  invalid=74  insert=0      insert_failed=0  drop=0    early_drop=0   error=9    search_restart=0  (null)=13    (null)=0
cpu=1  found=0  invalid=64  insert=7121   insert_failed=0  drop=0    early_drop=0   error=2    search_restart=0  (null)=235   (null)=0
cpu=2  found=0  invalid=83  insert=73041  insert_failed=0  drop=0    early_drop=0   error=495  search_restart=6  (null)=93    (null)=0
cpu=3  found=0  invalid=73  insert=184    insert_failed=0  drop=0    early_drop=0   error=0    search_restart=0  (null)=75    (null)=0
cpu=4  found=0  invalid=74  insert=0      insert_failed=0  drop=0    early_drop=0   error=885  search_restart=0  (null)=1     (null)=0
cpu=5  found=0  invalid=64  insert=1255   insert_failed=0  drop=0    early_drop=0   error=0    search_restart=0  (null)=0     (null)=0
cpu=6  found=2  invalid=68  insert=0      insert_failed=1  drop=1    early_drop=0   error=0    search_restart=0  (null)=1051  (null)=0
cpu=7  found=0  invalid=71  insert=3046   insert_failed=0  drop=0    early_drop=0   error=32   search_restart=0  (null)=88    (null)=0
trae@cr01a-vyos:~$ show conntrack-sync statist
cache internal:
current active connections:            22199
connections created:                   73875    failed:            0
connections updated:                   38217    failed:            0
connections destroyed:                 51676    failed:            0

external inject:
connections created:                   72473    failed:            0
connections updated:                   28742    failed:            0
connections destroyed:                   270    failed:            0

traffic processed:
          1031447727 Bytes                    411616 Pckts

multicast traffic (active device=bond0.110):
             9065520 Bytes sent              8208184 Bytes recv
              120422 Pckts sent               111498 Pckts recv
                   0 Error send                    0 Error recv

message tracking:
                   0 Malformed msgs                    0 Lost msgs

Main Table Statistics:

Here's the relevant portions of the configuration:

trae@cr01a-vyos:~$ show conf com | grep -P 'set (system conntrack|service conntrack-sync)'
set service conntrack-sync accept-protocol 'icmp'
set service conntrack-sync accept-protocol 'icmp6'
set service conntrack-sync accept-protocol 'tcp'
set service conntrack-sync accept-protocol 'udp'
set service conntrack-sync disable-external-cache
set service conntrack-sync event-listen-queue-size '100'
set service conntrack-sync failover-mechanism vrrp sync-group 'CR01.INT'
set service conntrack-sync ignore-address 'fe8::/10'
set service conntrack-sync ignore-address 'ff00::/8'
set service conntrack-sync ignore-address '169.254.0.0/16'
set service conntrack-sync ignore-address '224.0.0.0/4'
set service conntrack-sync ignore-address '127.0.0.0/8'
set service conntrack-sync interface bond0.110
set service conntrack-sync sync-queue-size '100'
set system conntrack flow-accounting
set system conntrack modules
set system conntrack table-size '1000000'
set system conntrack timeout icmp '10'
set system conntrack timeout other '60'
set system conntrack timeout tcp close-wait '20'
set system conntrack timeout tcp established '1800'
set system conntrack timeout tcp fin-wait '30'
set system conntrack timeout tcp syn-recv '30'
set system conntrack timeout tcp syn-sent '60'
set system conntrack timeout udp stream '60'

The b side shows similar symptoms and statistics. Please let me know if you need anything else, I can get you access to the routers as well.

Details

Version
1.5-rolling-202403120022
Is it a breaking change?
Perfectly compatible
Issue type
Bug (incorrect behavior)

Event Timeline

trae32566 triaged this task as High priority.
trae32566 created this object in space S1 VyOS Public.

Here's the generated configuration from /run/conntrackd/conntrackd.conf:

# Synchronizer settings
Sync {
    Mode FTFW {
        DisableExternalCache on
    }
    Multicast {
        IPv4_address 225.0.0.50
        Group 3780
        IPv4_interface 192.168.15.3
        Interface bond0.110
        SndSocketBuffer 104857600
        RcvSocketBuffer 104857600
        Checksum on
    }
}
Helper {
    Type rpc inet tcp {
        QueueNum 3
        Policy rpc {
            ExpectMax 1
            ExpectTimeout 300
        }
    }
    Type rpc inet udp {
        QueueNum 4
        Policy rpc {
            ExpectMax 1
            ExpectTimeout 300
        }
    }
    Type tns inet tcp {
        QueueNum 5
        Policy tns {
            ExpectMax 1
            ExpectTimeout 300
        }
    }
}

# General settings
General {
    HashSize 262144
    HashLimit 2000000
    LogFile off
    Syslog on
    LockFile /var/lock/conntrack.lock
    UNIX {
        Path /var/run/conntrackd.ctl
    }
    NetlinkBufferSize 2097152
    NetlinkBufferSizeMaxGrowth 104857600
    NetlinkOverrunResync off
    NetlinkEventsReliable on
    Filter From Userspace {
        Address Ignore {
            IPv4_address 169.254.0.0/16
            IPv4_address 224.0.0.0/4
            IPv4_address 127.0.0.0/8
            IPv6_address fe8::/10
            IPv6_address ff00::/8
        }
        Protocol Accept {
            TCP
            UDP
            ICMP
            IPv6-ICMP
        }
    }
}[edit]

@trae32566 Can you provide the next output?

sudo conntrackd -C /run/conntrackd/conntrackd.conf -s  && echo "conntrack_count: " && sudo conntrack -C
sudo conntrackd -C /run/conntrackd/conntrackd.conf -s network
sudo conntrackd -C /run/conntrackd/conntrackd.conf -s cache
sudo conntrackd -C /run/conntrackd/conntrackd.conf -s runtime
sudo conntrackd -C /run/conntrackd/conntrackd.conf -s link
sudo conntrackd -C /run/conntrackd/conntrackd.conf -s queue
Viacheslav changed the task status from Open to Needs reporter action.Apr 9 2024, 4:06 PM
syncer changed the subtype of this task from "Task" to "Bug".Apr 20 2024, 5:10 PM
syncer lowered the priority of this task from High to Normal.May 19 2024, 8:57 AM

@Viacheslav sorry, for some reason I didn't see this until now. I actually moved the routers to 1.4-epa3 to test whether it occurs on that version, and it does. Here are my conntrack stats after a week on 1.4-epa3 (note that I haven't started seeing connectivity issues yet, but I imagine I will in the next few days once I hit the limit):

trae@cr01a-vyos:~$ show conntrack-sync statist
cache internal:
current active connections:           403218
connections created:                 4998006    failed:            0
connections updated:                11289840    failed:            0
connections destroyed:               4594788    failed:            0

external inject:
connections created:                 3842881    failed:           17
connections updated:                 7867414    failed:           96
connections destroyed:                751144    failed:            0

traffic processed:
        107031061053 Bytes                 126442965 Pckts

multicast traffic (active device=bond0.110):
          1211200280 Bytes sent            864916612 Bytes recv
            16579261 Pckts sent             11514220 Pckts recv
                   0 Error send                    0 Error recv

message tracking:
                   0 Malformed msgs                    0 Lost msgs

Main Table Statistics:

b side:

trae@cr01b-vyos:~$ show conntrack-sync statist
cache internal:
current active connections:           809337
connections created:                 3932455    failed:            0
connections updated:                 8513320    failed:            0
connections destroyed:               3123118    failed:            0

external inject:
connections created:                 4983011    failed:          317
connections updated:                10790990    failed:           38
connections destroyed:               1779258    failed:            0

traffic processed:
         53410121497 Bytes                  54463245 Pckts

multicast traffic (active device=bond0.110):
           887690492 Bytes sent           1214620560 Bytes recv
            11792022 Pckts sent             16615080 Pckts recv
                   0 Error send                    0 Error recv

message tracking:
                   0 Malformed msgs                 2719 Lost msgs

Main Table Statistics:

Here are the requested commands:

trae@cr01a-vyos:~$ sudo conntrackd -C /run/conntrackd/conntrackd.conf -s  && echo "conntrack_count: " && sudo conntrack -C
cache internal:                                                                       
current active connections:           402996                   
connections created:                 4999939    failed:            0
connections updated:                11294139    failed:            0
connections destroyed:               4596943    failed:            0

external inject:
connections created:                 3844364    failed:           17
connections updated:                 7870459    failed:           96
connections destroyed:                751480    failed:            0

traffic processed:
        107040900851 Bytes                 126465152 Pckts

multicast traffic (active device=bond0.110):
          1211687752 Bytes sent            865264996 Bytes recv
            16585610 Pckts sent             11518648 Pckts recv
                   0 Error send                    0 Error recv

message tracking:
                   0 Malformed msgs                    0 Lost msgs

conntrack_count: 
700
trae@cr01a-vyos:~$ sudo conntrackd -C /run/conntrackd/conntrackd.conf -s network
network statistics:
        recv:
                Malformed messages:                        0
                Wrong protocol version:                    0
                Malformed header:                          0
                Malformed payload:                         0
                Bad message type:                          0
                Truncated message:                         0
                Bad message size:                          0
        send:
                Malformed messages:                        0

sequence tracking statistics:
        recv:
                Packets lost:                              0
                Packets before:                            0

multicast traffic (active device=bond0.110):
          1211734900 Bytes sent            865292396 Bytes recv
            16586315 Pckts sent             11519038 Pckts recv
                   0 Error send                    0 Error recv
trae@cr01a-vyos:~$ sudo conntrackd -C /run/conntrackd/conntrackd.conf -s cache
cache:internal  active objects:               403166
        active/total entries:                 403120/      403166
        creation OK/failed:                  5001649/           0
                no memory available:               0
                no space left in cache:            0
        update OK/failed:                   11297958/           0
                entry not found:                   0
        deletion created/failed:             4598529/           0
                entry not found:                   0

external inject:
connections created:                 3845762    failed:           17
connections updated:                 7873391    failed:           96
connections destroyed:                751769    failed:            0
trae@cr01a-vyos:~$ sudo conntrackd -C /run/conntrackd/conntrackd.conf -s runtime
daemon uptime: 5 days 11 h 28 min

netlink stats:
        events received:                    35150047
        events filtered:                         554
        events unknown type:                       0
        catch event failed:                        0
        dump unknown type:                         0
        netlink overrun:                           0
        flush kernel table:                        0
        resync with kernel table:                  0
        current buffer size (in bytes):      2097152

runtime stats:
        child process failed:                      0
                child process segfault:            0
                child process termsig:             0
        select failed:                             0
        wait failed:                               0
        local read failed:                         0
        local unknown request:                     0
trae@cr01a-vyos:~$ sudo conntrackd -C /run/conntrackd/conntrackd.conf -s link
multicast traffic device=bond0.110 status=RUNNING role=ACTIVE:
          1212369568 Bytes sent            865786808 Bytes recv
            16594857 Pckts sent             11525428 Pckts recv
                   0 Error send                    0 Error recv
trae@cr01a-vyos:~$ sudo conntrackd -C /run/conntrackd/conntrackd.conf -s queue
allocated queue nodes:                     0

queue txqueue:
current elements:                          0
maximum elements:                 2147483647
not enough space errors:                   0

queue errorq:
current elements:                          0
maximum elements:                        128
not enough space errors:                   0

queue rsqueue:
current elements:                         97
maximum elements:                     131072
not enough space errors:                   0
trae32566 changed the task status from Needs reporter action to Open.Jun 6 2024, 6:22 AM