Page MenuHomeVyOS Platform

Support for QoS Policy Propagation via BGP (QPPB)
Needs testing, NormalPublicFEATURE REQUEST

Assigned To
Authored By
zsdc
Jan 13 2022, 2:51 PM
Referenced Files
Restricted File
Nov 16 2023, 2:57 PM
Restricted File
Nov 16 2023, 2:57 PM
Restricted File
Nov 16 2023, 2:57 PM
F3667657: xdp_arch2.png
Feb 17 2023, 4:50 PM
F3667656: xdp_arch1.png
Feb 17 2023, 4:50 PM
F3667652: test_topo.png
Feb 17 2023, 4:50 PM
F3202318: qppb_artifacts.tar.gz
Sep 27 2022, 8:05 PM
F3199407: qppb_demo_m2.gns3project
Sep 22 2022, 1:37 PM

Details

Difficulty level
Hard (possibly days)
Version
-
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Perfectly compatible
Issue type
Feature (new functionality)

Event Timeline

Unknown Object (User) added a subscriber: Unknown Object (User).Jan 15 2022, 6:16 AM

Demo QPPB implementation supporting bgp-policy destination mode:


The following comments will include architecture, demo topology, debugging technics

v.huti changed Difficulty level from Unknown (require assessment) to Hard (possibly days).

You can find the latest version of the demo implementation here:

  1. volodymyrhuti/linux/tree/QPPB_DEMO_V1.1
  2. volodymyrhuti/frr/tree/QPPB_DEMO_V1.1

This update includes:

  1. Implemented the bgp-policy source mode
  2. Implemented the sysctl per interface switch to configure the QPPB mode
  3. Minor cleanups & fixes

The implementation is done, for the most part. The following plans are:

  • Make an example with QoS configuration
  • Make a clean demo topology
  • Finalizing and posting the documentation

The latest version of the demo can be found here:

  1. volodymyrhuti/linux/tree/QPPB_DEMO_V1.2
  2. volodymyrhuti/frr/tree/QPPB_DEMO_V1.2

This update includes - FRR:

  1. Reworked DSCP manipulation methods + moved them into a separate library module
  2. Introduced the bgp-policy <source|destination> perf interface configuration

Kernel:

  1. Restructured the marking logic
  2. Hid the logic under IP_ROUTE_QPPB config
  3. Basic in-tree documentation, killed dead code, minor cleanups

The latest version of the demo can be found here:

  1. volodymyrhuti/frr/tree/QPPB_DEMO_V1.3
  2. volodymyrhuti/xdp_qppb

During the discussion with @zdc it became clear that the previous implementation was only partially working.
That solution worked on the skb layer, meaning that resources are already allocated, and the ingress tc engine got skipped.
However, it is expected that marking can be noticed by the tc and used for ingress policing.
I have moved the logic to the XDP level to overcome this limitation.

The latest update includes:

  1. XDP program that should be used with ip link / tc infrastructure
  2. Helpers library for FRR to interact with the XDP

The next update will include:

  1. DSCP to VRF mappings
  2. Permission access fix for helper library Currently loading requires disabling verifications, i.e. /proc/sys/kernel/unprivileged_bpf_disabled
  3. GNS Demo topology. This will be much easier now since there is no need for a custom kernel
  4. Some basic functionality/sanity tests
  1. volodymyrhuti/frr/tree/QPPB_DEMO_V1.4
  2. volodymyrhuti/xdp_qppb

Changes on the FRR side:

  • Convert xdp helper library to an optional plugin + bgp hook
  • Minor fixes + cleanups
  • Figured out most of the permission problems

Changes on the XDP side:

  • Convert mappings from legacy iproute format to the latest libbpf one
  • New mappings improve debugging experience by implementing pretty-printing for XDP map dumping
  • Added an xdp-loader for xdp-tools repo

As well we have performed another review round with @zdc.
The next step is to do an initial review with smbdy from the FRR devs.

DEMO
===============================================

To demonstrate the feature let's look at the following topology

topo.png (447×900 px, 87 KB)



The scenario is following:

  • R1 has a management interface (loopback); we want to prioritize the loopback traffic
  • R2 announces prefix for the loopback over BGP with associated 'community' list 60:1
  • R3 has a QPPB map that associates the 60:1 with a DSCP tag AF22 | 0x50 | 80 for the following traffic control.
  • C1 and C2 are clients communicating with R1 using low/high priority flows.
---------------------------------------------------------------------------------------------------------
1. Load QPPB for interfaces and restart the FRR service

   [R3]debian@debian:~$ cat /etc/frr/daemons | grep bgp
   bgpd_options="   -A 127.0.0.1 -M vyos_qppb"
   ------------------------
   sudo ./xdp-loader load ens4 xdp_qppb.o  -p /sys/fs/bpf/ 
   sudo ./xdp-loader load ens5 xdp_qppb.o  -p /sys/fs/bpf/ 
   sudo service frr restart

---------------------------------------------------------------------------------------------------------
2. Wait for the bgp daemon to announce the prefixes.
   Verify that the XDP map was populated
   Configure perf interface QPPB mode

   ------------------------
   [R3]debian@debian:~$ journalctl -b | grep XDP
   ... bgpd[1085]: ... XDP mark prefix [1.0.0.1/32| dscp 80, err 0]
   ... bgpd[1085]: ... XDP mark prefix [24.0.0.0/24| dscp 124, err 0]
   ... bgpd[1085]: ... XDP mark prefix [23.0.0.0/24| dscp 72, err 0]
   ... bgpd[1085]: ... XDP mark prefix [22.0.0.0/24| dscp 40, err 0]

   $ sudo bpftool map list
   6: array  name qppb_mode_map  flags 0x0
   	key 4B  value 4B  max_entries 64  memlock 4096B
   	btf_id 7
   7: lpm_trie  name dscp_map  flags 0x1
   	key 8B  value 1B  max_entries 100  memlock 8192B
   	btf_id 7
   8: percpu_array  name xdp_stats_map  flags 0x0
   	key 4B  value 16B  max_entries 5  memlock 4096B
   	btf_id 7

   [R3]debian@debian:~$ sudo bpftool map dump id 7
   [{
           "key": {
               "prefixlen": 32,
               "src": 16777217
           },
           "value": 80
       },
           ...........
       ,{
           "key": {
               "prefixlen": 32,
               "src": 167772190
           },
           "value": 152
       }
   ]
   
   [R3]debian@debian:~$ dec_to_ip 16777217
                     => 1.0.0.1

   # if.id == 2 (ens4), mode == 2 (BGP_POLICY_SRC)
   $ sudo bpftool map update id 6 key 2 0 0 0  value 2 0 0 0
   # if.id == 3 (ens5), mode == 1 (BGP_POLICY_DST)
   $ sudo bpftool map update id 6 key 3 0 0 0  value 1 0 0 0

---------------------------------------------------------------------------------------------------------
3. Open the Wireshark and verify that the marking is working
   [R1] ping -I 1.0.0.1 30.0.0.3 -c 3
   [R3] -> [C1] => 80(0x50) TOS
   [R3] -> [R2] => 80(0x50) TOS

---------------------------------------------------------------------------------------------------------
4. Configure the `tc` rules to prioritize traffic associated with management iface
   [R3]
   alias tc="sudo tc"
   tc qdisc del dev ens4 root
   tc qdisc add dev ens4 root handle 1:0 htb default 10
   # create the parent qdisc, children will borrow bandwidth from
   tc class add dev ens4 parent 1:0 classid 1:1 htb rate 100mbit
   # create children qdiscs, reference parent
   tc class add dev ens4 parent 1:1 classid 1:10 htb rate 1mbit
   tc class add dev ens4 parent 1:1 classid 1:20 htb rate 10mbit ceil 90mbit

4.1 Using classic u32 rules
   tc filter add dev ens4 parent 1:0 prio 1 protocol ip u32 \
                         match ip tos 0x50 0xff flowid 1:20

4.2 Using custom bpf classifier
   tc filter add dev ens4 protocol ip parent 1:0 \
             bpf obj xdp_tc.o sec tc_mark classid 1: direct-action

4.3 Verify that filter is triggered for (un)prioritized traffic
   [R1] ping -I 1.0.0.1 30.0.0.3 -c 3      #   privileged
   [R1] ping -I 10.0.0.1 30.0.0.3 -c 3     # unprivileged
 
4.4 Setup iperf server on clients and verify that traffic shaping is working

   [R1] iperf3 -n 2Gb -B 1.0.0.1  -c 30.0.0.3 &
   [R1] iperf3 -n 2Gb -B 10.0.0.1 -c 30.0.0.4 &

   [R3] tc -s -p -g f ls dev ens4
   [R3] tc -s -p -g q ls dev ens4
   [R3] tc -s -p -g c ls dev ens4
---------------------------------------------------------------------------------------------------------


DEMO Notes:
=====================

1) You need to load the XDP program before starting frr so that
   it can find the LPM map on plugin initialization.
   To keep it simple, the VTY interface was not implemented for now.
   XDP side is accessible via `bpftool`
3) I`m monitoring packets for TOS/DSCP changes to see if marking happens
   But in another approach tag is associated with the packet and then
   read by the TC classifier
4) These are two traffic shaping examples.
   The point is that you have two options for marking:
4.1) Modifying the TOS byte and installing the u32 tc filter to match the value.
   This has a limited range of possible values (8 bits) + needs to modify the packet.
4.2) Using a custom BPF classifier.
    The XDP side extends the packet context and saves the value.
    Afterward, the classifier may read the context and control the shaping behavior
    by setting the `skb->tc_classid` or one of the fields mentioned below.
Therefore, BPF programs attached to the tc BPF hook can, for instance,
read or write the skb’s mark, pkt_type, protocol, priority, queue_mapping,
napi_id, cb[] array, hash, tc_classid or tc_index, vlan metadata, the XDP
transferred custom metadata and various other information. All members of
the struct __sk_buff BPF context used in tc BPF are defined in the
linux/bpf.h system header.

https://docs.cilium.io/en/stable/bpf/#tc-traffic-control

Updated artifacts with additional documentation:

DEB building steps FRR:
------------------------------------------
    ./bootstrap.sh
    ./debian/rules
    ./debian/rules override_dh_prep
    make -j $(nproc --all)
    sudo fakeroot debian/rules binary

Building steps XDP:
------------------------------------------
    make prepare
    make

Latest update:

  1. FRR - QPPB_DEMO_V1.5
  2. TC examples - volodymyrhuti/xdp_qppb
  • Added libbpf as git submodule + building target
  • Removed previous implementation from the patchset
  • Added the TC example

test_topo.png (482×832 px, 125 KB)

xdp_arch1.png (373×403 px, 20 KB)

xdp_arch2.png (351×413 px, 26 KB)

Intro
=========================================================================================================
The QoS Policy Propagation via BGP feature allows you to classify packets by IP precedence based on the
Border Gateway Protocol (BGP) community lists, BGP autonomous system paths, access lists, thus helping to
classify based on the destination instead of source address.

After packets have been classified, you can use other quality of service (QoS) features such as committed
access rate (CAR) and Weighted Random Early Detection (WRED) to specify and enforce policies to fit your
business model.

---------------------------------------------------------------------------------------------------------
On a large and complex network, a large number of MF (Multi-Field) classification operations are required,
and routes cannot be classified based on the community attribute, ACL, IP prefix, or AS_Path.
When a network topology keeps changing, configuring or changing routing policies is difficult or even
impossible to implement. Therefore, the QPPB is introduced to reduce configuration workload by configuring
or changing routing policies only on a BGP route sender.

QPPB is implemented as follows:
- Before sending BGP routes, a route sender sets a specific attribute, such as the AS_Path,
  community attribute, or extended community attribute, for BGP routes.
  These attributes are used to identify BGP routes.  
- After receiving the BGP routes, a route receiver performs the following operations:
  1. Maps each received BGP route to a QoS local ID, an IP preference and traffic behavior
     based on the AS_Path, community attribute, or extended community attribute.
  2. Performs different traffic behaviors for packets transmitted along the routes according
     to their mapped QoS local IDs, IP preference and traffic behavior.
     A route receiver can define traffic behaviors for the packets transmitted along the
     routes based on the following attributes:
     – ACL
     – AS-Path list
     – Community attribute list
     – Route cost
     – IP prefix list
     - etc ...
3. Creates a QPPB local policy and define the mappings between BGP routes and QoS policies in it.
4. Apply the QPPB local policy to all packets that meet the matching rules on interfaces.

---------------------------------------------------------------------------------------------------------

Cisco has worked hard over the years to streamline the process of table lookup in the routing table, to
reduce per-packet processing for the forwarding process, QPPB can use this same efficient table-lookup
process to reduce classification and marking overhead.

CEF optimizes forwarding by creating a new table that includes entries for the routes in the routing
table. This table is called the Forwarding Information Base (FIB). The FIB optimizes the process of
locating a route by performing a table lookup in the FIB rather than the less- efficient table lookup
of the routing table. In other words, CEF switching crunches the routing table into the FIB, and then
uses the FIB to make the forwarding decisions. (This in itself is somewhat of an oversimplification
of CEF; for more detail, refer to Vijay Bollapragada’s Inside Cisco IOS Software Architecture [Cisco Press, 2000].)

CEF optimizes the creation of new data-link headers by creating a table that contains the new data-
link header associated with each next-hop IP address in the FIB. By doing so, when FIB table lookup
is complete, the header can be added to the packet with little processing.
When QPPB marks a route, it actually marks either or both of the two fields inside each entry in the
FIB. The FIB contains IP precedence and QoS group fields in order to support QPPB. Therefore,
when CEF crunches the routing table to create FIB entries, when QPPB is configured, the
appropriate FIB precedence and QoS group fields are set.

Common Terms
=========================================================================================================
What are CoS, COS, ToS, Diffserv, DSCP, DS? (brief explanation, one sentence per acronym will be enough)

CoS - Is a 3-bit field that operates only on VLAN tagged Ethernet at the data link layer (layer 2).
The field specifies a priority value, that can be used by quality of service (QoS) disciplines to
differentiate and shape/police network traffic.

COS - Related to legacy telephone systems, COS can define permissions for voice traffic.

ToS - Second byte of the IPv4 header. Also referred to as the differentiated services field (DS field)
which consists of a 6-bit Differentiated Services Code Point (DSCP) field and a 2-bit
Explicit Congestion Notification (ECN) field.

Diffserv - Operates at the IP network layer (layer 3), is a new model for quality of service control.
Packets are individually classified and marked; policy decisions are made independently by each node in a path.
(see IntServ - Integrated Services)

DSCP - The first six bits of the IP TOS are evaluated to provide more granular classification;
backward-compatible with IP Precedence

DS - The DiffServ architecture defines the DiffServ (DS) field, which supersedes the ToS field in IPv4
to make per-hop behavior (PHB) decisions about packet classification and traffic conditioning functions,
such as metering, marking, shaping, and policing.
---------------------------------------------------------------------------------------------------------
Taken from:  https://github.com/temach/reports/blob/master/AN-Lab-1-qos.md \
             #f-what-are-cos-cos-tos-diffserv-dscp-ds-brief-explanation-one-sentence-per-acronym-will-be-enough
As well, good starter guide: https://hackmd.io/@tematibr/BJF7E-n4H
For more guides, check references / attached docs.
=========================================================================================================

Configuration
=========================================================================================================
Cisco Summary Steps (QPPBSection.pdf)
---------------------------------------------------------------------------------------------------------
    | Mode and Function : Command
 ## | Global command; creates a route map entry
 1. | route-map map-tag [permit | deny] [sequence-number]
    |
 ## | Route-map subcommand; used to match IP packets based on parameters matchable with an IP ACL
 2. | match ip address {access-list-number | access-list-name} [... access-list-number | ...  access-list-name]
    |
 ## | Route-map subcommand; used to mach IP packets based on their length
 3. | match length minimum-length maximum-length
    |
 ## | Route-map subcommand; sets IP precedence value using the decimal number of name.
 4. | set ip precedence <number | name>
    |
 ## | Route-map subcommand; sets a group ID in the routing table for classification throughout the network.
 5. | set ip qos-group group-id
    |
 ## | BGP subcommand; used to modify values related to BGP learned routes, including precedence and QoS group
 6. | table-map map-name
    |
 ## | Global command; used to create a community list, which matches values in the BGP community string
 7. | ip community-list community-list-number {permit | deny} community-number
    |
 ## | Global command; used to create an autonomous system (AS) path list, which matches values in the autonomous
    | system number (ASN) path BGP attribute
 8. | ip as-path access-list access-list-number {permit | deny} as-regexp
    |
 ## | BGP subcommand; used to make IOS use the AA:NN format for community values, with AA being the ASN,
    | and NN being a user-defned value
 9. | ip bgp-community new-format
    |
 ## | Interface subcommand; enables QPPB for packets entering the interface, marking IP precedence
 10.| bgp-policy ip-prec-map
    |
 ## | Interface subcommand; enables QPPB for packets entering the interface, marking QoS group
 11.| bgp-policy ip-qos-map

    Steps | Notes:
    ------+-------------------------------------------
      4,5 | `set ip precedence` -> `set dscp`
          | `set ip qos-group`  -> `xdp_tc_mark`
        9 |            N/A (?)
    10,11 | implemented as BPF map, use `bpftool`
          | map holds values from (4,5)


Monitoring QoS Policy Propagation via BGP:
=========================================================================================================
To monitor the QoS Policy Propagation via the BGP feature configuration, use the following optional
commands.
1. show ip bgp         | Displays entries in the Border Gateway Protocol (BGP) routing table to verify whether
                       | the correct community is set on the prefixes.
2. show ip cef network | Displays entries in the forwarding information base (FIB) table based on the
                       | specified IP address to verify whether Cisco Express Forwarding has the correct
                       | precedence value for the prefix.
3. show ip interface   | Displays information about the interface.
show ip bgp community-list | Displays routes permitted by the BGP community to verify whether correct
                           | prefixes are selected.

    Steps | Notes:
    ------+-------------------------------------------
       3  | Not implemented, use `bpftool` to dump QPPB_MAP instead
       2  | N/A, there is no CEF on linux, use
          |    show  bgp ipv4 <nexthop>
          | CEF alternative is implemented in `xdp_qppb`
          | there are no specific requirements on how this should be implemented
          | meaning we can implement a more sophisticated classification flow if needed
          | Current:
          |   1. check if qppb enabled on ingress iface
          |   2. check if a packet should be forwarded - lookup `fib` table
          |    - skip marking if there is no `fib` entry
          |   3. lookup `dscp/qos-id` by SRC/DST (qppb mode - BGP_POLICY_SRC/_DST)
          |   4. modify packet TOS field / extend XDP metadata with `qos-id` tag
          |    - qos-id will be processed on the `tc` ingress hook


=========================================================================================================
FRR documentation for table-map
---------------------------------------------------------------------------------------------------------
.. clicmd:: table-map ROUTE-MAP-NAME
   clihelp:: "BGP table to RIB route download filter\n"
   This feature is used to apply a route-map on route updates from BGP to
   Zebra.  All the applicable match operations are allowed, such as match on
   prefix, next-hop, communities, etc. Set operations for this attach-point are
   limited to metric and next-hop only. Any operation of this feature does not
   affect BGPs internal RIB.

   Supported for ipv4 and ipv6 address families. It works on multi-paths as
   well, however, metric setting is based on the best-path only.

Minimal match-all QPPB map
---------------------------------------------------------------------------------------------------------
   input_dict_1 = {
       "r2": {
           "route_maps": {
               "QPPB": [{
                       "action": "permit",
                       "set": { "dscp": "af11", }
               }]
           }
       }
   }
=========================================================================================================

QEMU GDB debug
=========================================================================================================
Here is a summary of making an easy-to-use setup for kernel research and debugging.
Compile a kernel for GNS Debian VM and attach the gdb session.
- Save config from GNS host to the build tree
  scp $GNS_HOST:/boot/config-<version> .
- Disable optimization for the targeted module:
     --- a/net/ipv4/Makefile
  ⋮ 20 │  ccflags-y += -fno-default-inline -fno-inline -fno-inline-small-functions \21 │               -fno-indirect-inlining -fno-inline-functions-called-once
  ⋮ 22 │  ccflags-y += -O1 -g3 -ggdb3
- Enable debug config
  XXX: docs/kernel_debug_conf.txt - example options
  XXX: docs/debian.conf           - config used for the screen capture
- Compile debian packages while skipping all the new options with defaults
  Transfer and install them to test the host
--------------------------------------------------------------
  yes 'n' | make oldconfig
  make -j $(getconf _NPROCESSORS_ONLN) bindeb-pkg LOCALVERSION=-custom
  scp ../*.deb $GNS_HOST:~/
  host> dpkg -i *.deb
--------------------------------------------------------------
- Specify `-s' as qemu argument for GNS Host and restart
--------------------------------------------------------------
  Configure -> Advanced -> Additional settings -> Options
  -s  Shorthand for -gdb tcp::1234, i.e. open a gdbserver on TCP port 1234.
  add any other debugging arguments, i.e. `earlyprintk, rw, init=/bin/bash ...`

  NOTE: `nokaslr` may be needed if the debugger stops on the wrong line / doesn't find the src.
  You can edit /boot/grub/grub.cfg, add arguments to the `linux` menuentry
--------------------------------------------------------------
- Start debugger instance, load kernel binary, connect to remote `:1234`, load symbols
--------------------------------------------------------------
  function linux_gdb {
    _root=$HOME/Desktop/linux-next
    cd $_root && \
    gdb -ex "add-auto-load-safe-path $_root" \
        -ex "file $_root/vmlinux"            \
        -ex "target remote localhost:1234" \
        -ex "lx-symbols $_root"
  }

  $ linux_gdb
  # The GNS console will freeze, waiting for commands from gdb client
  > b ip_forward
  > c
  ....  R1 ping -> C1
  < gdb triggered
  > p *skb
  > p *net
  > info locals
  > backtrace full 
  .... debugging session
  > delete break
  > c
  ....
  > quit
  $ _
--------------------------------------------------------------

NOTE: if you need a minimal setup to inspect some kernel structure on the fly, check this repo:
https://github.com/cirosantilli/linux-kernel-module-cheat#gdb
This build lets you disable all optimizations but lacks advanced Linux tools/network infra.
You can import the generated QEMU into your GNS3 project, which requires manually creating a template with init args.

PyTest
=========================================================================================================
Assuming you want to run the top without running the test suit, run the following:
-----
 $ pytest -sv --topology-only ... test_bgp_qppb.py
 unet> help

 Commands:
   help                       :: this help
   sh [hosts] <shell-command> :: execute <shell-command> on <host>
   term [hosts]               :: open shell terminals for hosts
   vtysh [hosts]              :: open vtysh terminals for hosts
   [hosts] <vtysh-command>    :: execute vtysh-command on hosts

 unet> sh r1 ping 10.0.3.1
 unet> vtysh r1 show runnings
                ....
------
You can use `get_topogen().cli()` to open this CLI from the debugger, check the helpers sections for more.

---------------------------------------------------------------------------------------------------------
                                   Workflow:
---------------------------------------------------------------------------------------------------------
- Enable debug logs in `tests/topotest/pytest.ini` 
--------------------------------------------------------------
        addopts = "-sv"
        verbosity = debug
        show_router_config = True
              ...
        log_level = DEBUG
        log_cli_level = DEBUG
              ...
        # shorten date fmt for visibility
        log_file_date_format = %H:%M:%S
        log_cli_date_format =  %H:%M:%S
--------------------------------------------------------------
- Setup breakpoint before testing QoS setup, start test
  [Optionally] setup pysnooper for generation execution log
  https://github.com/cool-RR/PySnooper - Never use print for debugging again
  https://github.com/cool-RR/PySnooper/blob/master/ADVANCED_USAGE.md
--------------------------------------------------------------
    os.environ['PYTHONBREAKPOINT'] = 'pudb.set_trace'
    test_qos_setup -> breakpoint()
    # run all tests
    sudo -E PYTHON=python3 pytest-3 --pudb test_bgp_qppb.py
    # run a single test
    sudo -E PYTHON=python3 pytest-3 --pudb test_bgp_qppb.py -k test_qos_topo
    -------------------------
    import pysnooper
    @pysnooper.snoop('/home/vova/snoop/qppb_qos.log', depth=3, max_variable_length=200)
    def test_func(...):
    NOTE: log is always appended to the file, without prior truncating
    -------------------------
    NOTE: if a test fails before clearing bpffs ...
    sudo umount /tmp/topotests/bgp_qppb_vyos_flow.test_bgp_qppb/r*/bpf
--------------------------------------------------------------
- Start traffic visualizers (check helpers `bw_monitor`)
- Verify that traffic pattern matches the expectations
- Dump topo/local/global/tc/bpf/unet (check helpers)
--------------------------------------------------------------
  sudo bpftool prog tracelog
  > tc_log_stats(r1, "r1-r2-eth0")
--------------------------------------------------------------
- Run and visualize some traffic
--------------------------------------------------------------
for i in range(1, 5):
    client = start_client(
        h1, dst="10.6%d.0.1" % i, port=5200 + i, timeout=10, background=True
    )
--------------------------------------------------------------
- Dump tc/bpf env, check snooper logs for the deep dive
--------------------------------------------------------------
  cd $HOME/snoop
  grep' BW samples'...
  grep iperf  ...
  ...
--------------------------------------------------------------
TODO:  gdb workflow?
=========================================================================================================

Helpers
=========================================================================================================
The testing framework is built on top of Linux namespaces (not a vm like qemu).
These are instantiated with `unshare cat` and are `anonymous,` not listed in `ip netns`
You can list them by `pgrep cat` and interact using `mnexec/nsenter` tools
You can find example cmds in the test logs, i.e:
/tmp/topotests/*/exec.log: # DEBUG: topolog: CMD to enter r3: sudo nsenter -a -t 713161
---------------------------------------------------------------------------------------------------------
    # dump vtysh per netns
    function dump_nms {
    for _pid in $(pgrep cat); do
        echo -en "\n\n\n\n===========================\n"
        echo "Namespace: $_pid"
        sudo mnexec -a $1 vtysh -c 'show run'
        # sudo mnexec -a $_pid bash
    done
    }

    # start `unet` cli from debugger
    tgen.cli()
    # run from debugger to start binary is separate window
    tgen.net.hosts['r3'].run_in_window('bash')
    tgen.net.hosts['r3'].run_in_window('wireshark')

    -------------------------------------------
    # run command in router ns
    function rns {
        local host=$1; shift
        pushd /tmp/topotests/bgp_qppb_vyos_flow.test_bgp_qppb
        sudo mnexec -a $(cat $host.pid) $@
        popd
    }

    function bw_monitor {
      set -x
      CMDS=(
        "export LANG=C; rns r2 iftop -P -B -F 10.0.0.0/8 -m100M"
        "rns r2 nethogs -d 0.1"
        "rns r2 pktstat -B -T -t -w 0.25"
        # "rns r2 bandwhich -n -c"
      )
      _size=$(( ${#CMDS[@]} - 1 ))
      _ses="demo"

      tmux kill-session -t $_ses
      tmux new-session -d -s $_ses
      echo Size: ${#CMDS[@]}
      # read q
      for i in $( seq 1 $_size ); do
        tmux split-window -v -t $_ses:1.$i
      done
      for i in $( seq 0 $_size ); do
        tmux send-keys -t $_ses:1.$((i+1)) \
          "${CMDS[$i]}" Enter "$pass" Enter
      done
      tmux attach-session -t $_ses
    }


        # Working with namespaces
    -------------------------------------------
    from pyroute2 import IPRoute, NetNS, IPDB, NSPopen, setns
    from pprint import pp;
    # Prefferably, use rich for improved performance/colors
    # from rich import print as pp;

    local_ns=NetNS('/proc/1/ns/net')
    r1_ns_path="/proc/{}/ns/net".format(r1.net.pid)
    r1_ns=NetNS("/proc/{}/ns/net".format(r1.net.pid))

    pp(locals()); #pp(globals()); pp(topo)
    # dumps netns environemnts (routes/interfaces/rules/...)
    pp([x for x in r1_ns.dump()])
    pp(r1_ns.get_default_routes())
    pp(vars(r1_ns))
    pp(dir(r1_ns))
    # setns(r1_ns)
    -------------------------------------------
    # dump bpf log buffer (/sys/kernel/debug/tracing/trace_pipe)
    r1.bpf.trace_print()
    # bpf_print_trace(tgen.gears['r1'].bpf)
    # use methods from test/library, i.e. dump tc stats
    tc_log_stats(r1, iface)


          Network monitoring tools:
    ------------------------------------------------
            Monitoring / Stats
         --------------------------
     iftop, bmon, bwmon-ng, speedometer, pktstat, ifstat, netstat
     tcpdump, wireshark, termshark, tshark, scapy, ngrep
     ethtool, tcptrack, vnstat, ss, lsof
         --------------------------
            Traffic Generating
         --------------------------
     iperf, netperf, socat, nc, ping, nload, iptraf-ng, netsniff-ng
         --------------------------
    nethogs -d 0.1
    bandwhich -n
    pktstat  -B -T -t -w 0.25
       -B | display in Bps
       -T | show totals
       -t | `Top` mode
       -w | time between window updates
    iftop -P -B -F 10.0.0.0/8 -m100M
          | export LANG=C for tmux
       -P | show ports
       -B | display bandwidth in bytes
       -F | ipv4 network

    For more tools check:
    https://github.com/raboof/nethogs#links
    http://netsniff-ng.org/
    ------------------------------------------------

#NOTE: if test fails prior to clearing bpffs ...
sudo umount /tmp/topotests/bgp_qppb_vyos_flow.test_bgp_qppb/r*/bpf
---------------------------------------------------------------------------------------------------------

Notes
---------------------------------------------------------------------------------------------------------
1. Working with XDP - bpftool vs BCC
   # Absolute Beginner's Guide to BCC, XDP, and eBPF
   # https://gist.github.com/satrobit/17eb0ddd4e122425d96f60f45def9627
   The main admin's means of interacting with BPF/XDP objects is the `bpftool` utility.
   # You can find a link with the full overview in the reference section.

   BCC is a toolkit for creating efficient kernel tracing and manipulation programs
   and includes several useful tools and examples. It makes use of extended BPF (Berkeley Packet Filters),
   formally known as eBPF, a new feature that was first added to Linux 3.15.
   Much of what BCC uses requires Linux 4.1 and above.

2. XDP loader
   XXX: a bunch of tools supports compiling/loading XDP, but they differ in BPF syntax, etc...
        Currently, not sure if there is a need for a separate loader against the BCC

3. XDP debugging
   The program keeps track of processed packets and stores this to `/sys/fs/bpf/<iface>/xpd_stats_map`
   This functionality was preserved from the `libbpf` tutorial and required the `xdp_stats` tool for reading.
   TODO: needs rewriting to python, it shouldn't be complex...

4. XDP mappings are separate objects that do not depend on FRR.
   If FRR crashes - the mappings may stay intact and keep forwarding the traffic.
   Users may have multiple maps used under different conditions/requirements and update the FRR handler
   without the controll plane noticing the modification.

5. Performance/Scale/Bandwidth  testing:
---------------------------------------------------------------------------------------------------------
NOTE: QoS testing is outside of the scope of FRR. Therefore I have implemented only a basic example.
Testing this feature would require a robust framework to generate flows and measure the quality/stability.
I prefer extending one of the following projects for this purpose if the feature gets enough interest.

Other related projects may be integrated with QPPB for a more robust QoS system
https://github.com/cryzed/TrafficToll
https://github.com/LibreQoE/LibreQoS (thx to v.gletenko for refence)

NeST: Network Stack Tester
NeST is a python3 package aiding researchers and beginners alike in emulating real-world networks.
https://nest.nitk.ac.in/#/
https://gitlab.com/nitk-nest/nest
https://gitlab.com/nitk-nest/nest/-/blob/master/nest/experiment/run_exp.py

LNST: Linux Network Stack Test 
Linux Network Stack Test is a tool that supports the development and execution of automated and portable network tests.
https://github.com/LNST-project/lnst
https://github.com/LNST-project/lnst/wiki/Iperf
https://github.com/LNST-project/lnst/blob/master/lnst/Tests/Iperf.py

FRR test library was initially built on top of `mininet`, but was later replaced with `micronet`
https://github.com/mininet/mininet (The best way to emulate almost any network on your laptop!)
https://github.com/cnp3/ipmininet  # Mininet extension to make experimenting with IP networks easy
https://github.com/cnp3/ipmininet/blob/72cefde536ca02c650875d4b14d93899824af668/ipmininet/tests/test_tc.py#L47

ipmininet/examples/tc_advanced_network.py
ipmininet/examples/tc_network.py
ipmininet/tests/test_tc.py

XXX (relevant for later??): Iperf at 40Gbps and above 
https://fasterdata.es.net/performance-testing/network-troubleshooting-tools/iperf/multi-stream-iperf3/

Limitations
---------------------------------------------------------------------------------------------------------
1. BPF should be initialized before starting the `bgpd`. The logic is following
   - BPF will create the mapping objects on initialization/loading/compilation
   - FRR triggers the plugin init hook, which is gonna look for mapping files
     # at this point, FRR has not yet changed the user, so we are `root.`
     - files not available, the QPPB doesn't work, traffic flow is not affected
     - files are available, the dscp tag get's associated with a prefix by your route-map
     # FRR drops root perms. It will be unable to open `/sys/fs/bpf/...` from this point on ...
     # A `on/off` CLI  requires calling `setuid(0)` to reload the BPF pin.

2. You can set up multiple bpf routers for Topotest, but it was not tested.
   XXX: Host tools may have problems to interacting with separate bpf instances
=========================================================================================================

References
          QPPB examples
---------------------------------------------------------------------------------------------------------
https://www.noction.com/blog/qos-policy-propagation-via-bgp-qppb
https://pierky.wordpress.com/tag/qos/
https://cisconinja.wordpress.com/2009/01/14/qos-policy-propagation-with-bgp/
https://community.cisco.com/t5/service-providers-knowledge-base/asr9000-xr-implementing-qos-policy-propagation-for-bgp-qppb/ta-p/3136639
https://books.google.com.ua/books?id=qq3FAgAAQBAJ&pg=PA278&lpg=PA278&dq=QPPB&ots=mvWNkGAtP2&sig=ACfU3U3lymWe2HvtZVR7pONnOVmXanvdFA&hl=en#v=onepage&q=QPPB&f=false
http://www.h3c.com/en/Support/Resource_Center/HK/Home/Switches/00-Public/Configure/Configuration_Guide/H3C_S6800[S6860][S6861]_(R27xx)_S6820_CG-6W100/08/201906/1201594_294551_0.htm
https://www.cisco.com/c/en/us/td/docs/iosxr/cisco8000/qos/75x/b-qos-cg-8k-75x/classify_packets_to_identify_specific_traffic.html#concept_b2j_5xq_2tb
https://littlewolf.moe/bgp/372/
https://www.youtube.com/watch?v=t024CqVsu6I

                BGP
---------------------------------------------------------------------------------------------------------
https://learningnetwork.cisco.com/s/article/BGP-Zero-to-Hero-Part-1---Establishing-Peering-s
https://jvns.ca/blog/2021/10/05/tools-to-look-at-bgp-routes/

        Iproute documentation
---------------------------------------------------------------------------------------------------------
https://baturin.org/docs/iproute2/#ip-rule-add-tos
http://www.policyrouting.org/iproute2.doc.html#ss9.5
https://docs.pyroute2.org/iproute.html#htb
https://docs.pyroute2.org/netns.html
https://www.lartc.org/lartc.html

             TC guides
---------------------------------------------------------------------------------------------------------
https://www.funtoo.org/Traffic_Control
https://wiki.archlinux.org/title/advanced_traffic_control
https://web.archive.org/web/20190216230807/https://wiki.linuxwall.info/doku.php/en:ressources:dossiers:networking:traffic_control
http://jve.linuxwall.info/ressources/taf/Plug-North-QoS-2011.pdf
https://openwrt.org/docs/guide-user/network/traffic-shaping/packet.scheduler
http://linux-ip.net/pages/diagrams.html
http://linux-ip.net/pages/documents.html
http://linux-ip.net/gl/tc-filters/tc-filters.html
QoS in Linux with TC and Filters - TC + TOS examples:
https://www.linux.com/training-tutorials/qos-linux-tc-and-filters/
TC bpf direct-action
https://qmonnet.github.io/whirl-offload/2020/04/11/tc-bpf-direct-action/
HTB manual - user guide
http://luxik.cdi.cz/~devik/qos/htb/manual/userg.htm
https://www.arvanta.net/~mps/linux-tc.html

https://github.com/sonic-net/SONiC/tree/master/doc/qos
=========================================================================================================
                BPF
=========================================================================================================
Absolute Beginner's Guide to BCC, XDP, and eBPF
https://gist.github.com/satrobit/17eb0ddd4e122425d96f60f45def9627
https://docs.cilium.io/en/stable/bpf/
https://ebpf.io/what-is-ebpf/
https://github.com/borkmann/bpf-docs

EDB - eBPF debugger (not tried myself, tough)
https://github.com/dylandreimerink/edb

Bpftool showcase:
https://qmonnet.github.io/whirl-offload/2021/09/23/bpftool-features-thread/

BCC (BPF Compiler Collection):
https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md
https://github.com/iovisor/bcc/blob/master/docs/tutorial_bcc_python_developer.md

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/bpf/bpf_devel_QA.rst
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/bpf/bpf_design_QA.rst

https://github.com/leandromoreira/linux-network-performance-parameters
https://github.com/zoidbergwill/awesome-ebpf

                 XDP
---------------------------------------------------------------------------------------------------------
https://www.iovisor.org/technology/xdp
https://github.com/dsahern/bpf-progs/blob/master/docs/netdev-0x14-XDP-and-the-cloud.pdf

XDP test suite for Linux kernel.pdf
https://github.com/0voice/linux_kernel_wiki/blob/main/%E8%AE%BA%E6%96%87/%E3%80%8AXDP%20test%20suite%20for%20Linux%20kernel%E3%80%8B.pdf

                Kernel
---------------------------------------------------------------------------------------------------------
https://blog.packagecloud.io/monitoring-tuning-linux-networking-stack-sending-data/
https://blog.packagecloud.io/illustrated-guide-monitoring-tuning-linux-networking-stack-receiving-data/
https://blog.packagecloud.io/monitoring-tuning-linux-networking-stack-receiving-data/

Namespaces:
https://man7.org/linux/man-pages/man7/namespaces.7.html
https://man7.org/linux/man-pages/man7/mount_namespaces.7.html
https://man7.org/linux/man-pages/man1/nsenter.1.html
https://man7.org/linux/man-pages/man1/unshare.1.html

        Tracing/Debugging/Visualizing
---------------------------------------------------------------------------------------------------------
https://wiki.python.org/moin/PythonDebuggingTools
https://github.com/cool-RR/PySnooper
https://stackoverflow.com/questions/25308847/attaching-a-process-with-pdb

https://github.com/goldshtn/linux-tracing-workshop
https://github.com/vinta/awesome-python
https://github.com/Textualize/rich
https://project-awesome.org/#networking
https://realpython.com/python-debugging-pdb/

                FRR
=========================================================================================================
https://docs.frrouting.org/projects/dev-guide/en/latest/topotests-jsontopo.html
https://docs.frrouting.org/projects/dev-guide/en/latest/topotests.html

TC implementation in FRR:
https://sigeryang.net/gsoc2022-frr/
https://github.com/FRRouting/frr/pull/11908

=========================================================================================================
               DSCP
=========================================================================================================
https://www.bytesolutions.com/dscp-tos-cos-presidence-conversion-chart/
https://www.tucny.com/Home/dscp-tos
https://www.ccexpert.us/root-bridge/layer-3-qos-classification-with-dscp.html
https://www.ccexpert.us/routing-switching/dscp-settings-and-terminology.html
https://www.ccexpert.us/ccda/tos.html
https://linuxreviews.org/Type_of_Service_(ToS)_and_DSCP_Values

Note 802.11 (wifi) uses different TOS/DSCP mappings/names
https://github.com/vanhoefm/krackattacks-scripts/blob/research/tests/remote/rutils.py#L303
http://wifi-insider.com/wlan/wmm.htm

SO_PRIORITY and IP_TOS:
https://stackoverflow.com/questions/48095837/setting-dscp-value-to-socket-using-setsockopt
https://stackoverflow.com/questions/37998404/what-is-the-effect-of-setting-a-linux-socket-high-priority
https://gist.github.com/wenjianhn/0f7a9a1e36018a42515c
https://manpages.ubuntu.com/manpages/focal/en/man8/tc-skbprio.8.html
https://www.mediaonfire.com/blog/2013_11_01_dscp_tagging_with_iptables.html
man skbprio
man tc-prio

Dsmark - queueing discipline, DSCP marker/remarker:
https://tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.adv-qdisc.dsmark.html
https://www.lartc.org/lartc.html#LARTC.ADV-QDISC.DSMARK
http://softwareopal.com/qos/default.php?p=ds-29

=========================================================================================================

Dependencies
=========================================================================================================
Dependencies:
---------------------------------------------------------------------------------------------------------
XXX: bcc(last), pyroute2,  libbpf
XXX: run tests on docker and check what is missing

TBD
=========================================================================================================
- Scalability/Performance testing
  - NeST framework has initial support for FRR + bandwidth control/testing.
    Although there is no support for BGP/QPPB/DSCP... so far
  - LNST may be another good place to start
- Following features: IP6, L3+ interfaces i.e. bridges/vrfs/tunnels/.../default route marking?
  From my understanding, the qemu uses xdp-generic mode for interfaces instead of native.
  We would need to run this on an actual device to see the performance implications from
  additional fib lookup on xdp side
  --------
  Need to consider other available packet redirection strategies if there are any relevant
  difference / other implications
  --------
- Possible Classifier/Marker configuration collisions/overlapping testing
- integration with other features like `frr-tc` and need in separate cli (?)
Are there any other test suits we can run on top of qppb setup?
  tools/testing/selftests/drivers/net/mlxsw/qos_*.sh

TODO:
---------------------------------------------------------------------------------------------------------
- Implement xdp tools version verification within tests
- Update docker builds / docs / instructions 
- iperf visualization?
- artifacts
- compile the requirements list
v.huti changed the task status from Open to Needs testing.Feb 17 2023, 4:56 PM
v.huti triaged this task as Normal priority.

I`m back after a long break and will follow up on this feature.
Here is a summary of things that have happened since the last update:

1. In my absence, the feature testing got broken as a result of migrating from the `mininet` to the `munet` framework
From debugging, I have identified a root cause to be - bpf fs was not inherited by the `munet` router.
The solution is to hop into the router mount namespace for the test run.

2. Have followed through CI warnings and errors.
- killed ~250 lines of questionable code
- fixed styling
- added copyrights
- etc...
I still see odd issues for some platforms but need help fixing them.

3. I joined the FRR development meeting last Tuesday and showed a short feature demo.
I showed a new diagram with the configs composing the feature setup (attached here).
The feedback was the following:
- I need to move the plugin invocation further down the software stack and call it from the zebra.
This way, the DSCP field will be exposed to the rest of the protocols/demons.
I have implemented something similar in the first version of the feature and will return it.

4. Considering retesting the [dscp -> VRF] mapping subfeature on a new kernel for the next version of QPPB.

{F3899809}

{F3899808}

{F3899807}