This article provides instructions for resolving etcd version compatibility issues when upgrading to Kubernetes >= 1.31 in BCM environments.
Problem: During Kubernetes installations or upgrades to versions 1.31 and later, the kubeadm commands such as init fail with an etcd version error. BCM 10.0 and 11.0 were released with etcd version 3.5.22.
Root Cause: Kubernetes tightened the minimum supported etcd version in patch releases to prevent clusters from upgrading into a known-bad state that can break control-plane rollouts. Older etcd 3.5.x versions had upgrade bugs (learner promotion/membership inconsistencies) that can cause upgrades to fail. To "fail fast", kubeadm's version gate was backported across supported branches. Unfortunately the etcd version used by BCM Kubernetes is 3.5.22, which is now no longer compatible with the latest patch versions of Kubernetes >= 1.31.
Solution: Upgrade etcd to version 3.5.24 or later before attempting a Kubernetes installation or upgrade.
- BCM Version: 10 or 11
- Target Kubernetes Version: >= 1.31
- Current Etcd Version: cm-etcd package <= 3.5.25 must be upgraded
The etcd maintainers documented a critical upgrade failure path from etcd 3.5 to 3.6. Under certain sequences, a voting member can revert to a learner because membership changes were persisted only in the v2store (in etcd 3.5) but etcd 3.6 treats the v3store as the source of truth. This can strand upgrades with "too many learner members" errors or propagate incorrect membership information.
Fixes were implemented in etcd 3.5.20+ with additional backports in 3.5.24, including learner-promotion persistence fixes and Go toolchain bumps. Kubernetes raised the minimum etcd version requirement to ensure clusters don't upgrade into this problematic state.
For more detailed information about this issue:
- etcd Blog: Upgrade from 3.5 to 3.6 Issue
- etcd Blog: Upgrade from 3.5 to 3.6 Issue Follow-up
- Kubernetes Pull Request #134861
When using the BCM Kubernetes setup wizard, it defaults to installing the latest patch version for the selected minor version (e.g., 1.32.10 for Kubernetes 1.32.x). Recent patch releases for versions >= 1.31 include the etcd version check that will cause installation to fail.
The installation fails during the kubeadm initialization stage with the following error:
#### stage: kubernetes: Kubeadm Initialize First Node
Initializing kubeadm cluster on node001...
[init] Using Kubernetes version: v1.32.10
[preflight] Running pre-flight checks
[preflight] Some fatal errors occurred:
[ERROR ExternalEtcdVersion]: this version of kubeadm only supports external etcd version >= 3.5.24-0. Current version: 3.5.22
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
error execution phase preflight
BCM will release updated cm-etcd packages to address this issue. At the time of writing, these packages are available for manual download.
Choose undo to clean up the failed cm-kubernetes-setup. If the setup was aborted instead, it's possible to clean it up with:
cm-kubernetes-setup --remove --yes-i-really-mean-itIt may be necessary to provide the label of the Kube cluster with an additional --cluster <label> flag to the above command.
(To see all the labels of installed clusters, see inside cmsh -c 'kubernetes list')
There have been reported cases where the above instructions did not work. In case the setup is not removable, please use the following procedure to remove instead.
root@rb-kube:~# wget https://support2.brightcomputing.com/cm-etcd/cleanup-old-install.sh
...
root@rb-kube:~# chmod +x cleanup-old-install.sh root@rb-kube:~# cmsh -c 'kubernetes list'
Name (key)
------------------
default
k8s-userIn our case we want to remove the k8s-user cluster.
root@rb-kube:~# ./cleanup-old-install.sh k8s-user
firewall role on rb-kube
firewall role on rb-kube
apiserverproxy role on rb-kube
removing kubecluster from apiserverproxy
...We repeat the cmsh command:
root@rb-kube:~# cmsh -c 'kubernetes list'
Name (key)
------------------
defaultDownload the appropriate package from: https://support2.brightcomputing.com/cm-etcd/
Verify the downloaded package using the following MD5 checksums:
| BCM Version | Distribution | Package Filename | MD5 Checksum |
|---|---|---|---|
| BCM 10 | RHEL 8 | cm-etcd-3.5.25-100101_cm10.0_960146e38f.x86_64.rpm |
33ecb94d6b16d52dd204432ddbe5b2ac |
| BCM 10 | Ubuntu 24.04 | cm-etcd_3.5.25-100101-cm10.0-960146e38f_amd64.deb |
ab9c5bc39f912eb722eab7eb90ddef41 |
| BCM 10 | Ubuntu 20.04 | cm-etcd_3.5.25-100101-cm10.0-960146e38f_amd64.deb |
7c8e78bd066ff5f8d09b5c541005fa76 |
| BCM 10 | Ubuntu 22.04 | cm-etcd_3.5.25-100101-cm10.0-960146e38f_amd64.deb |
4cb03d03eb6b0038c1bd77e673cee9e0 |
| BCM 10 | SLES 15 | cm-etcd-3.5.25-100101_cm10.0_960146e38f.x86_64.rpm |
51980d287d294b50f284c44fc751e488 |
| BCM 10 | RHEL 9 | cm-etcd-3.5.25-100101_cm10.0_960146e38f.x86_64.rpm |
7768f407db9f52e5b9e1ea7733f2997c |
| BCM 11 | RHEL 8 | cm-etcd-3.5.25-100104_cm11.0_5e7f36e727.x86_64.rpm |
644d14e350807f88bbb9f2e8cfefff8c |
| BCM 11 | Ubuntu 24.04 | cm-etcd_3.5.25-100104-cm11.0-5e7f36e727_amd64.deb |
45dd303abe3190cf407ec6c0d02995b4 |
| BCM 11 | Ubuntu 22.04 | cm-etcd_3.5.25-100104-cm11.0-5e7f36e727_amd64.deb |
9fa483e9f6d8b424d454f50755c25ff9 |
| BCM 11 | SLES 15 | cm-etcd-3.5.25-100104_cm11.0_5e7f36e727.x86_64.rpm |
6efa06afd7d5d9ea217e2ce7c0fa6c21 |
| BCM 11 | RHEL 9 | cm-etcd-3.5.25-100104_cm11.0_5e7f36e727.x86_64.rpm |
90f7f7d615bf9b151107c1c45c718cc9 |
Install the package in the appropriate software image for the etcd nodes before executing the setup.
For example if the Etcd nodes are provisioned from /cm/images/k8s-control-image:
# Example for BCM 11 on Ubuntu 24.04
# first enter the software image chroot
cm-chroot /cm/images/k8s-control-image
# inside it, download and install the appropriate package
wget https://support2.brightcomputing.com/cm-etcd/bcm11/ubuntu2404/cm-etcd_3.5.25-100104-cm11.0-5e7f36e727_amd64.deb
apt install ./cm-etcd_3.5.25-100104-cm11.0-5e7f36e727_amd64.deb
# exit the chroot
exitAfter installing the updated package, proceed or redo with the Kubernetes setup as normal.
For existing clusters that need to be upgraded to Kubernetes >= 1.31, the etcd version must be updated to meet the minimum requirements.
Follow the same process as Scenario 1 to obtain and verify the updated cm-etcd package. However, for existing clusters, use a rolling update approach to minimize disruption.
Install the new cm-etcd package in the relevant software images as described in Scenario 1.
Instead of updating all etcd nodes simultaneously, update them one at a time:
-
Update First etcd Node:
# On the etcd node, BCM 11 Ubuntu 24.04 wget https://support2.brightcomputing.com/cm-etcd/bcm11/ubuntu2404/cm-etcd_3.5.25-100104-cm11.0-5e7f36e727_amd64.deb apt install ./cm-etcd_3.5.25-100104-cm11.0-5e7f36e727_amd64.deb -
Restart etcd Service:
# Note: Package installation may have already prompted for automatic restart, in that case can be skipped systemctl restart etcd -
Verify etcd Version and Health:
module load etcd && etcdctl endpoint status --cluster --write-out=tableExpected output showing mixed versions during rolling update:
+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://10.141.0.1:2379 | 4a336cbcb0bafdc0 | 3.5.25 | 249 MB | false | false | 8623 | 4355787 | 4355782 | | | https://10.141.0.2:2379 | 10cee25dc156ff4a | 3.5.22 | 280 MB | true | false | 8623 | 4355778 | 4355778 | | | https://10.141.0.3:2379 | bd786940e5446229 | 3.5.22 | 252 MB | false | false | 8623 | 4355788 | 4355772 | | +-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ -
Verify Cluster Health:
etcdctl -w table endpoint --cluster health
Expected output:
+-------------------------+--------+--------------+-------+ | ENDPOINT | HEALTH | TOOK | ERROR | +-------------------------+--------+--------------+-------+ | https://10.141.0.2:2379 | true | 20.617606ms | | | https://10.141.0.3:2379 | true | 23.436572ms | | | https://10.141.0.1:2379 | true | 327.724023ms | | +-------------------------+--------+--------------+-------+ -
Repeat for Remaining Nodes: Continue only if all nodes report healthy status. Repeat steps 1-4 for each remaining etcd node.
Once all etcd nodes are running version 3.5.24 or later, proceed with the Kubernetes upgrade.