It seems like it should be a fairly simple task to make an install automatically recover from a bad boot or upgrade:
- On successful boot, "success" being defined by boot commit succeeding without error, a small string containing the current boot image is written to persistent storage somewhere. /boot?
- add system image broken.img, reboot
- Commit fails on boot. atrm timer started for like 3 minutes (or configurable?) Gives time to cancel for troubleshooting
- atrm triggers reboot. First changes the boot image back to the last successful one from #1. Maybe it mounts the last successful image and copies the failure logs to that image for later troubleshooting and digestion? Something like tar -zcvf /conf/faillog-20191020.tar.gz /var/log