Rewrite load-balancing wan to XML/Python
Closed, ResolvedPublicFEATURE REQUEST
Actions

Assigned To

Authored By

	Viacheslav
	Jun 18 2022, 2:52 PM

Description

Rewrite load-balancing wan to XML/Python

Details

Version: 1.5
Is it a breaking change?: Config syntax change (migratable)

Related Objects
Search...

Status	Subtype	Assigned	Task
Resolved	FEATURE REQUEST	sarthurdev	T4470 Rewrite load-balancing wan to XML/Python
Wontfix	BUG	Viacheslav	T4422 WAN load-balance status failed on all interfaces if one of them failed
Resolved	FEATURE REQUEST	Viacheslav	T4518 Add XML for CLI conf mode load-balancing wan
Not Applicable	BUG	sarthurdev	T4587 wan load balance issues with 3 or more WANs
Not Applicable	BUG	sarthurdev	T4443 Wan Load Balancing Multiple Regressions
Open	ENHANCEMENT	None	T114 Allow wan load-balancing rules to match against groups
Resolved	BUG	Viacheslav	T4362 Wan Load Balancing - Can't create routing tables
Resolved	FEATURE REQUEST	Viacheslav	T5171 Use XML for conf-mode "load-balancing wan" instead of legacy templates
Resolved	FEATURE REQUEST	Viacheslav	T5203 load-balancing wan add systemd unit instead of old vyatta-wanloadbalance.init

Event Timeline

Viacheslav created this task.Jun 18 2022, 2:52 PM

Viacheslav changed the subtype of this task from "Bug" to "Feature Request".

pasik subscribed.Jun 18 2022, 3:26 PM

Viacheslav mentioned this in T4443: Wan Load Balancing Multiple Regressions.Jun 28 2022, 12:18 PM

marc_s subscribed.Jun 28 2022, 12:20 PM

masterit subscribed.Jun 28 2022, 1:04 PM

Viacheslav added a subtask: T4422: WAN load-balance status failed on all interfaces if one of them failed.Jul 1 2022, 1:08 PM

Viacheslav changed the status of subtask T4518: Add XML for CLI conf mode load-balancing wan from Open to In progress.Jul 8 2022, 10:07 AM

Viacheslav closed subtask T4518: Add XML for CLI conf mode load-balancing wan as Resolved.Jul 29 2022, 6:26 PM

Viacheslav added a subtask: T4587: wan load balance issues with 3 or more WANs.Aug 3 2022, 10:47 AM

Viacheslav added a subtask: T4443: Wan Load Balancing Multiple Regressions.

blackhole subscribed.Aug 3 2022, 11:23 AM

also it would be good if WLB function will control main routing table, that would help to avoid a lot of confusion with protocols static configuration& WLB function. Current documentation does not telling anything about how exactly protocols static 0.0.0.0/0 route must be set with WLB.
From what I had tested:
1)WLB creates additional routing tables and setting PBR rules
2)without protocols static route 0.0.0.0 with next-hops to every wlb GW local vyos traffic would not work(as would not work traffic to vyos)

Viacheslav added a subtask: T114: Allow wan load-balancing rules to match against groups.Aug 11 2022, 8:36 AM

I have used this feature in the past but not anymore due to the issues listed in the regressions task. We are now running pfsense purely for LB since this (mostly) works as advertised. Looking back at this current implementation there are some very useful features that are missing.

First is the use of RTT timers separately from packet loss to mark a link as faulty. On vyos we have the response time parameter but this is very course with immediate effect which causes link flapping whenever you have congestion on more than one uplink. This results in high packet loss even though a working 2nd or 3rd tier link is available the entire time. By integrating RTT measurements over time some hysteresis can be added to the link selection logic.

Secondly, as stated above the current implementation manipulates tables and marks packets directly with zero observability possible using the normal operation mode tools. I think it would be much clearer to allow selection of which table and vrf a default route is inserted into, it having a configurable lower priority than a static route (like any other dynamic protocol so it can be overwritten by other options) and using a observable route priority marker like distance so it's actually possible to see what's going on.

I think the pfsense implementation is as good as any to base this work off (if it's not possible to just port it, probably not due to PHP/OS differences), it uses dpinger to come up with the values used to switch gateways which provides considerable flexibility if we were to expose all of the parameters.

@thetooth There is a new feature failover route where you can set metrics
https://github.com/vyos/vyos-1x/pull/1358
It could be extended to some "load-balancing"

I have been thinking about this over the weekend and looked into your failover implementation, there's nothing wrong with it and should serve most peoples needs. That said I am not too good with python so it was more straight forward to start from scratch.

Working on the idea from c-po's comment I have put together a poc here: https://github.com/thetooth/vyos-failover
It's effectively passive IP SLA using ICMP checks with the routes being the actionable item, it also adds weighted multipath, which is very useful in the setups I do at work (a lot of remote site stuff where multiple wireless links is the best you can get, lots of congestion issues over complete outages).
Let me know your thoughts on this as I'm no network engineer and perhaps people would prefer multiple routes with different metrics (also I don't see how we can have the same route e.g. 0.0.0.0/0 defined more than once with the current JSON schema as this implies support for duplicate keys in the parser).
I've also added a output stream for use with operation mode commands, which would look like this: https://gist.github.com/thetooth/acaceafc75716425462baaea69cfed69
It should be consumable by the telegraf service as well so performance can be tracked over longer periods or some other alerting tools used.

I think this sort of approach is ultimately more useful for WLB users that are stuck with purely passive ways of trying to keep the internet on in a small office for example.

danhusan subscribed.Oct 17 2022, 11:33 AM

The problem is that failover route will not solve multiwan scenarios where you have 2 or more links for incoming traffic, I.e web. Most good infrastructures would have dedicated management uplink, and also multiple WANs for serving client traffic. That approach increases infrastructure security and provide much more cleaner way to define zone policies. But to do that all traffic, especially incoming one must be correctly marked. I’ve tried a lot of ways to configure wlb, but every time vyos had tried to reply from the wrong interface, that’s why I had crated a bug task here

Also it seems, that’s issue appears on 3 or more wans, as I remember it worked with 2 WAN interfaces

@Nova_Logic I understand your frustration with the old WLB, it is not compatible with policy routes, DNAT, or fwmarks due to the way it's implemented. However WLB or this new implementation are not ingress capable tools. That is, these fill a niche in SMB setups where BGP peering is not possible (due to the use of commodity ISPs), or the cost and/or complexity of operating an IGP or even physically connecting into something like enterprise ethernet, is just completely out of the question. Despite the limitations these setups still need a way to switch over from faulted links quickly and reliably so you don't have an office full of people twiddling there thumbs while the internet is down.

That said, at work we use a setup where multiple carrier uplinks are aggregated at a cloud hosted instance of vyos, both ends are running iBGP, with the transit networks being wireguard tunnels. This has the advantage over these single ended approaches in that traffic is blind to the break in flow, TCP/etc will only see packet loss as the underlying routing topology changes, but the connection itself does not change from the responders point of view, the same source address remains on both sides so they both keep yapping. The disadvantage is of course you're trying to funnel packets inside of packets, so bandwidth is reduced by a measurable margin. If you're interested in this approach I documented my lab setup here so you might give that a try if you're getting desperate.

so you mean that new WLB implementation(on which I assume we're discussing here) would not mark incoming packets/sessions to allow vyos to DNAT/send replies to correct WAN like pfsense for example does?

@Nova_Logic no it would not function as intended, the reason is say you have 3 interfaces, and interface 1 has a metric of 1, 2 a metric of 2, etc. If a packet comes in off one of these interfaces it will be routed to it's destination with the appropriate DNAT rule, the source address is the initiators global unicast address with the mac of the router itself. Now when your service replies it's hosts routing table looks like

0.0.0.0/0 via routers-localaddr

The destination address is of course the remote global unicast address of the client and the source is the local area network address (information is still sufficient at this point), however the routers table will look like

0.0.0.0 via iface1 metric 1
0.0.0.0 via iface2 metric 2
0.0.0.0 via iface3 metric 3

So the reply will always go out iface1, source NAT happens post routing, so the source address of the reply packet from the initiating clients perspective has changed, thus being invalid and dropped by any correctly configured firewall.

The only way to get around this is to apply a PBR rule for returning traffic based on it's source port (your services listening port), this traffic is sent to a specific routing table containing the gateway of the expected incoming interface. Obviously this nulls out any use of the WLB generated table. The reason IGP protocols don't have this problem is because no NAT is taking place, so the router has a consistent view of what flows are active in the sense that there is a 1:1 mapping between your LAN servers mac and the originating mac of the transit network interface.

scj643 subscribed.Oct 20 2022, 5:26 PM

@thetooth but according to current docs that exactly what is documented in docs: https://docs.vyos.io/en/equuleus/configuration/loadbalancing/index.html

"Upon reception of an incoming packet, when a response is sent, it might be desired to ensure that it leaves from the same interface as the inbound one. This can be achieved by enabling sticky connections in the load balancing"
That exact scenario I've tried to configure:
3 wans, configured through WLB, inbound sticky connection for DNAT of 80,443 to internal reverse proxy. but that "stickiness" did not worked- vyos mos of the times tried to reply from wrong WAN to inbound packets

Viacheslav added a subtask: T4362: Wan Load Balancing - Can't create routing tables.Nov 28 2022, 12:53 PM

Viacheslav changed the status of subtask T4362: Wan Load Balancing - Can't create routing tables from Open to Needs testing.Apr 3 2023, 3:46 PM

Viacheslav closed subtask T4362: Wan Load Balancing - Can't create routing tables as Resolved.Apr 4 2023, 7:28 AM

Harliff subscribed.Apr 4 2023, 11:00 AM

Viacheslav changed the status of subtask T5171: Use XML for conf-mode "load-balancing wan" instead of legacy templates from Open to In progress.Apr 20 2023, 3:04 PM

PR https://github.com/vyos/vyos-1x/pull/1973

Viacheslav changed the status of subtask T5171: Use XML for conf-mode "load-balancing wan" instead of legacy templates from In progress to Needs testing.May 5 2023, 8:09 AM

Viacheslav changed the status of subtask T5203: load-balancing wan add systemd unit instead of old vyatta-wanloadbalance.init from Open to In progress.May 5 2023, 10:11 AM

Viacheslav closed subtask T5203: load-balancing wan add systemd unit instead of old vyatta-wanloadbalance.init as Resolved.May 8 2023, 7:59 AM

Viacheslav reopened subtask T5203: load-balancing wan add systemd unit instead of old vyatta-wanloadbalance.init as Needs testing.May 9 2023, 2:05 PM

Viacheslav closed subtask T5171: Use XML for conf-mode "load-balancing wan" instead of legacy templates as Resolved.Jun 13 2023, 11:02 AM

JeffWDH subscribed.Nov 16 2023, 12:55 PM

dmbaturin closed this task as Resolved.Jan 9 2024, 5:34 PM

dmbaturin claimed this task.

Viacheslav closed subtask T5203: load-balancing wan add systemd unit instead of old vyatta-wanloadbalance.init as Resolved.Jan 20 2024, 12:37 PM

Viacheslav changed the status of subtask T4422: WAN load-balance status failed on all interfaces if one of them failed from Open to Needs reporter action.Apr 18 2024, 4:29 PM

Viacheslav closed subtask T4422: WAN load-balance status failed on all interfaces if one of them failed as Wontfix.Apr 18 2024, 4:39 PM

Draft PR: https://github.com/vyos/vyos-1x/pull/4108 (WIP)

sarthurdev edited projects, added VyOS 1.5 Circinus; removed VyOS 1.4 Sagitta.Sep 29 2024, 11:33 AM

sarthurdev changed Version from 1.4 to 1.5.

sarthurdev moved this task from Open to In Progress on the VyOS 1.5 Circinus board.

syncer triaged this task as Normal priority.Sep 29 2024, 8:03 PM

PR: https://github.com/vyos/vyos-1x/pull/4108

syncer moved this task from In Progress to Open on the VyOS 1.5 Circinus board.Oct 12 2024, 7:48 AM

vyosbot added a project: Restricted Project.Oct 14 2024, 8:17 AM

dmbaturin edited projects, added VyOS Rolling; removed Restricted Project.Oct 14 2024, 9:26 AM

dmbaturin changed Is it a breaking change? from Unspecified (possibly destroys the router) to Config syntax change (migratable).

dmbaturin changed Issue type from Unspecified (please specify) to improvement.

syncer added a subscriber: Global Notifications.Nov 1 2024, 9:19 PM

sarthurdev mentioned this in rVYOSONEXa03174843512: wlb: T4470: Migrate WAN load balancer to Python/XML.Feb 18 2025, 10:04 AM

sarthurdev mentioned this in rVYOSONEXab6382ede233: wlb: T4470: Support WLB op-mode commands.

Restricted Repository Identity mentioned this in rVYOSONEXd6a82c134bed: Merge pull request #4108 from sarthurdev/wlb_python.Feb 18 2025, 10:04 AM

sarthurdev closed this task as Resolved.Feb 19 2025, 6:50 PM

sarthurdev moved this task from Need Triage to Completed on the VyOS Rolling board.

sarthurdev closed subtask T4587: wan load balance issues with 3 or more WANs as Not Applicable.Feb 19 2025, 7:13 PM

sarthurdev closed subtask T4443: Wan Load Balancing Multiple Regressions as Not Applicable.

Rewrite load-balancing wan to XML/PythonClosed, ResolvedPublicFEATURE REQUESTActions

Description

Details

Related ObjectsSearch...

Event Timeline

Rewrite load-balancing wan to XML/Python
Closed, ResolvedPublicFEATURE REQUEST
Actions

Related Objects
Search...