Page MenuHomeVyOS Platform

commit archive: reboot not working with sftp
Closed, ResolvedPublicBUG

Description

Using the following configuration will "break" system reboots once the backup target is unreachable.

system {
    config-management {
        commit-archive {
            location sftp://vyos:[email protected]/backups/configs/
        }
        commit-revisions 200
    }
}

Having a look at the process monitor the initial boot commit is stuck

frr        949  0.1  0.3  11232  2736 ?        Ss   21:27   0:01 /usr/lib/frr/bfdd -d -F traditional --daemon -A 127.0.0.1
root      1204  0.2  0.5  25292  4212 ?        S    21:28   0:01 /opt/vyatta/sbin/my_commit
root      3494  0.0  0.0   2280   732 ?        S    21:29   0:00  \_ /bin/run-parts --regex=^[a-zA-Z0-9._-]+$ -- /etc/commit/post-hooks.d
root      3508  0.0  2.2  44828 17168 ?        S    21:29   0:00      \_ /usr/bin/perl /etc/commit/post-hooks.d/02vyatta-commit-push.pl
root      3516 63.7  3.3 186584 24976 ?        R    21:29   8:17          \_ python3 -c from vyos.remote import upload; upload("/tmp/config.boot.3508", "sftp://vyos:[email protected]/backups/configs/"
root      1223  1.9  3.1  31812 23800 ?        S    21:28   0:16 ddclient - sleeping for 50 seconds

@erkin there should be an "unreachable" timeout of 30 to 60 seconds

image.png (374×668 px, 126 KB)

Details

Difficulty level
Unknown (require assessment)
Version
1.3.1
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Perfectly compatible
Issue type
Bug (incorrect behavior)

Related Objects

Mentioned In
1.3.4

Event Timeline

c-po updated the task description. (Show Details)
c-po added a project: VyOS 1.4 Sagitta.
c-po changed Version from 1.3.1-S1 to 1.3.1.
c-po added a subscriber: erkin.
c-po updated the task description. (Show Details)

I can confirm this has been the reason I've had issues upgrading from 1.2.x to 1.3.x.
Removing this statement before attempting, I can now upgrade from 1.2 to 1.3 smoothly, no OOM errors or other problems.

The issue is actual.
I reproduced it. SSH was accessible at this moment. And sftp server was accessible too.

root       1628 30.3 77.4 3359552 3107544 ?     R    08:38   1:05 python3 /usr/libexec/vyos/vyos-boot-config-loader.py /opt/vyatta/etc/config/config.boot
root       2442 98.3  0.7 184564 31196 ?        R    08:38   3:26 python3 -c from vyos.remote import upload; upload("/tmp/config.boot.2438", "sftp://uat:[email protected]/uat/config.boot-vyos.20230719_083833", source_host="")

But after some changes in the config, I could not reproduce it again. Rollback did not help. The router boots in the normal way.
Version: Vyos 1.3.2
Platform: VMware

zsdc changed the task status from Open to In progress.Jul 21 2023, 1:29 PM
zsdc claimed this task.
zsdc added a subscriber: zsdc.

To reproduce the problem:

  • static IP address must be configured, or the DHCP server must be really fast and config big enough so the system can get an IP and route to the remote server before the boot commit ends
  • none of the routers between the VyOS and archive location should answer with ICMP host unreachable
  • remote location must be not available or SSH key changed
  • changes in configuration must be made after the latest successful upload

There are two problems:

  • socket.create_connection() used in the https://github.com/vyos/vyos-1x/blob/26af45a61bbe8b219b57127a869e723b11886522/python/vyos/remote.py#L172C16-L172C40 has no timeout by default - socket.getdefaulttimeout() is None. Therefore, if a host is not reachable and nothing raises an exception (e.g. no route, port unreachable, ICMP no route, etc.), then python will wait forever.
  • MissingHostKeyPolicy waits for interactive input if an SSH key is not in the known list. During the boot, stdout is not available and users have no way to answer "yes", and the script waits forever

A fix is also simple:

  • we need to set default timeout before calling create_connection() using socket.setdefaulttimeout(timeout) or set timeout in the create_connection() explicitly. I think that the optimal solution is adding a default timeout in case it is not configured.
  • unknown SSH fingerprints should be discarded in non-interactive mode.

PR for 1.3: https://github.com/vyos/vyos-1x/pull/2106

Changes for 1.4 must be slightly different in part of error handling, because of the different way to start the module there.

zsdc changed the task status from In progress to Needs testing.Jul 25 2023, 9:53 AM
syncer moved this task from In Progress to Finished on the VyOS 1.4 Sagitta board.