Page MenuHomeVyOS Platform

pki: race condition for acme requested certificates - CA auto import only on the second run
Closed, ResolvedPublicFEATURE REQUEST

Description

Summary

When using the VyOS internal PKI subsystem to request a certificate using ACME, the issuer CA is not automatically imported in the PKI subsystem on the first run due to a race condition.

Use case

It's always a good idea to provide the full certificate chain when running a daemon that uses SSL certificates

Additional information

This can be reproduced by:

Check if there are no ACME related certificates on the system.

vyos@vyos# ls /config/auth/letsencrypt/live
ls: cannot access '/config/auth/letsencrypt/live': No such file or directory

Request an ACME certificate from the LetsEncrypt staging API

set pki certificate LR5.wue4 acme domain-name 'LR5.wue4.vyos.net'
set pki certificate LR5.wue4 acme email 'LR5@vyos.net'
set pki certificate LR5.wue4 acme url 'https://acme-staging-v02.api.letsencrypt.org/directory'

Check installed PKI certificates:

cpo@LR5.wue4# run show pki
Certificate Authorities:
Name    Subject    Issuer CN    Issued    Expiry    Private Key    Parent
------  ---------  -----------  --------  --------  -------------  --------

Certificates:
Name      Type    Subject CN             Issuer CN                            Issued               Expiry               Revoked    Private Key    CA Present
--------  ------  ---------------------  -----------------------------------  -------------------  -------------------  ---------  -------------  ------------
LR5.wue4  Server  CN=lr5.wue4.vyos.net   CN=(STAGING) Wannabe Watercress R11  2025-03-30 11:45:10  2025-06-28 11:45:09  No         Yes            No

Certificate Revocation Lists:
CA Name    Updated    Revokes
---------  ---------  ---------
[edit]

It misses the auto imported CA chain. Currently only a reboot will auto import the issuing CA into pki ca certificate tree.

cpo@LR5.wue4:~$ show pki
Certificate Authorities:
Name                Subject                                                             Issuer CN                     Issued               Expiry               Private Key    Parent
------------------  ------------------------------------------------------------------  ----------------------------  -------------------  -------------------  -------------  --------
AUTOCHAIN_LR5.wue4  CN=(STAGING) Counterfeit Cashew R10,O=(STAGING) Let's Encrypt,C=US  CN=(STAGING) Pretend Pear X1  2024-03-13 00:00:00  2027-03-12 23:59:59  No             N/A

Certificates:
Name      Type    Subject CN             Issuer CN                            Issued               Expiry               Revoked    Private Key    CA Present
--------  ------  ---------------------  -----------------------------------  -------------------  -------------------  ---------  -------------  ------------------------
LR5.wue4  Server  CN=lr5.wue4.vyos.net  CN=(STAGING) Counterfeit Cashew R10  2025-03-30 11:48:57  2025-06-28 11:48:56  No         Yes            Yes (AUTOCHAIN_LR5.wue4)

Certificate Revocation Lists:
CA Name    Updated    Revokes
---------  ---------  ---------

Details

Version
2025.03.30-0020-rolling
Is it a breaking change?
Perfectly compatible
Issue type
Feature (new functionality)

Event Timeline

c-po changed the task status from Open to In progress.
c-po claimed this task.
c-po triaged this task as Normal priority.
c-po changed Version from - to 2025.03.30-0020-rolling.
c-po updated the task description. (Show Details)

The current fix is necessary, but it's not complete.

The rest of the issue lies in this part of the code: config_dict_mangle_acme()

The problem is that if we use get_config_dict(..., with_pki=True), the config_dict_mangle_acme() function receives only the ACME certificate name and returns a single certificate–private key pair. However, it completely ignores the Let's Encrypt CAs. Therefore, later find_chain() has not enough material to extract a full chain.

From my understanding of the overall structure, the expected behavior would be for config_dict_mangle_acme() to additionally check for the presence of a chain.pem file and include all certificates from it as well. A little modification of get_config_dict is required as well.

Hi @zsdc,

your assumption is correct, but:

config_dict_mangle_acme() is indeed used to to "blend in" the BASE64 PEM encoded certificate data into the config dict object. This mimics the behavior if someone will load manual certificate data into set pki certificate NAME certificate or set pki certificate NAME private key as the ACME retrieved certificate data is stored into exactly these key inside the config dictionary - we do not do anything special here, so all existing code can be re-used for loading the full certificate chain.

Example CLI:

set pki certificate LR5.wue4 acme domain-name 'LR5.wue4.mybll.net'
set pki certificate LR5.wue4 acme email 'foo@bar.com'
cpo@LR5.wue4# run show pki
Certificate Authorities:
Name                Subject                      Issuer CN        Issued               Expiry               Private Key    Parent
------------------  ---------------------------  ---------------  -------------------  -------------------  -------------  --------
AUTOCHAIN_LR5.wue4  CN=R11,O=Let's Encrypt,C=US  CN=ISRG Root X1  2024-03-13 00:00:00  2027-03-12 23:59:59  No             N/A

Certificates:
Name      Type    Subject CN             Issuer CN    Issued               Expiry               Revoked    Private Key    CA Present
--------  ------  ---------------------  -----------  -------------------  -------------------  ---------  -------------  ------------------------
LR5.wue4  Server  CN=lr5.wue4.mybll.net  CN=R11       2025-04-01 11:20:34  2025-06-30 11:20:33  No         Yes            Yes (AUTOCHAIN_LR5.wue4)

Now startup OpenConnect:

set vpn openconnect authentication local-users username random password 'random12345'
set vpn openconnect authentication mode local 'password'
set vpn openconnect network-settings client-ip-settings subnet '10.0.0.0/29'
set vpn openconnect network-settings name-server '1.1.1.1'
set vpn openconnect ssl certificate 'LR5.wue4'

And you will see the full certificate chain added to ocserv

cpo@LR5.wue4# cat /run/ocserv/cert.pem
-----BEGIN CERTIFICATE-----
MIIFJjCCBA6gAwI...
-----END CERTIFICATE-----

-----BEGIN CERTIFICATE-----
MIIFBjCCAu6...
-----END CERTIFICATE-----

I think what you are experiencing is that when you run this in a single commit:

set pki certificate LR5.wue4 acme domain-name 'LR5.wue4.mybll.net'
set pki certificate LR5.wue4 acme email 'foo@bar.com'

set vpn openconnect authentication local-users username random password 'random12345'
set vpn openconnect authentication mode local 'password'
set vpn openconnect network-settings client-ip-settings subnet '10.0.0.0/29'
set vpn openconnect network-settings name-server '1.1.1.1'
set vpn openconnect ssl certificate 'LR5.wue4'

The full chain is missing from /run/ocserv/cert.pem most likely because of https://vyos.dev/T7307

c-po moved this task from Open to Finished on the VyOS 1.5 Circinus board.
dmbaturin changed Is it a breaking change? from Unspecified (possibly destroys the router) to Perfectly compatible.
c-po moved this task from Backlog to Finished on the VyOS 1.4 Sagitta (1.4.2) board.