- Have a script to verify configuration is correct
- Check that ipmitool successfully connects
- Verify PXE options are set correctly
- Verify every network interface on each node. This could be done with slim pxe boot image that is not installed.
- Verify all downloaded components (Docker containers, pre-built images, etc..) can be downloaded on each node and jump server
- Verify virtualization is turned on and other BIOS features for supporting PMEM, SGX etc..
- Provide more checks in install scripts and more hints at errors.
- Add a lot more debugging hints at end of install guide
- What are all the common logs to check and where are they?
- What are common failure points in install script and what does it look like?
- Server doesn’t reboot.
- PXE doesn’t boot.
- Kubernetes isn’t installed on node
- What pods/container logs should be checked at various points.
|