BCM 10/11 - etcd Upgrade Required for Kubernetes >= 1.31

Overview

This article provides instructions for resolving etcd version compatibility issues when upgrading to Kubernetes >= 1.31 in BCM environments.

Problem: During Kubernetes installations or upgrades to versions 1.31 and later, the kubeadm commands such as init fail with an etcd version error. BCM 10.0 and 11.0 were released with etcd version 3.5.22.

Root Cause: Kubernetes tightened the minimum supported etcd version in patch releases to prevent clusters from upgrading into a known-bad state that can break control-plane rollouts. Older etcd 3.5.x versions had upgrade bugs (learner promotion/membership inconsistencies) that can cause upgrades to fail. To "fail fast", kubeadm's version gate was backported across supported branches. Unfortunately the etcd version used by BCM Kubernetes is 3.5.22, which is now no longer compatible with the latest patch versions of Kubernetes >= 1.31.

Solution: Upgrade etcd to version 3.5.24 or later before attempting a Kubernetes installation or upgrade.

Prerequisites

BCM Version: 10 or 11
Target Kubernetes Version: >= 1.31
Current Etcd Version: cm-etcd package <= 3.5.25 must be upgraded

Background

The etcd maintainers documented a critical upgrade failure path from etcd 3.5 to 3.6. Under certain sequences, a voting member can revert to a learner because membership changes were persisted only in the v2store (in etcd 3.5) but etcd 3.6 treats the v3store as the source of truth. This can strand upgrades with "too many learner members" errors or propagate incorrect membership information.

Fixes were implemented in etcd 3.5.20+ with additional backports in 3.5.24, including learner-promotion persistence fixes and Go toolchain bumps. Kubernetes raised the minimum etcd version requirement to ensure clusters don't upgrade into this problematic state.

Relevant Documentation

For more detailed information about this issue:

Scenario 1: New Kubernetes Setup via cm-kubernetes-setup

When using the BCM Kubernetes setup wizard, it defaults to installing the latest patch version for the selected minor version (e.g., 1.32.10 for Kubernetes 1.32.x). Recent patch releases for versions >= 1.31 include the etcd version check that will cause installation to fail.

Symptom

The installation fails during the kubeadm initialization stage with the following error:

#### stage: kubernetes: Kubeadm Initialize First Node
Initializing kubeadm cluster on node001...
[init] Using Kubernetes version: v1.32.10
[preflight] Running pre-flight checks
[preflight] Some fatal errors occurred:
       [ERROR ExternalEtcdVersion]: this version of kubeadm only supports external etcd version >= 3.5.24-0. Current version: 3.5.22
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
error execution phase preflight

Solution

BCM will release updated cm-etcd packages to address this issue. At the time of writing, these packages are available for manual download.

Step 1: Remove the failed Kubernetes setup first if needed.

Choose undo to clean up the failed cm-kubernetes-setup. If the setup was aborted instead, it's possible to clean it up with:

cm-kubernetes-setup --remove --yes-i-really-mean-it

It may be necessary to provide the label of the Kube cluster with an additional --cluster <label> flag to the above command. (To see all the labels of installed clusters, see inside cmsh -c 'kubernetes list')

There have been reported cases where the above instructions did not work. In case the setup is not removable, please use the following procedure to remove instead.

Install the cleanup script on the head node

root@rb-kube:~# wget https://support2.brightcomputing.com/cm-etcd/cleanup-old-install.sh
...
root@rb-kube:~# chmod +x cleanup-old-install.sh

Determine which Kube cluster should be removed

root@rb-kube:~# cmsh -c 'kubernetes list'
Name (key)        
------------------
default
k8s-user

In our case we want to remove the k8s-user cluster.

root@rb-kube:~# ./cleanup-old-install.sh k8s-user
firewall role on rb-kube
firewall role on rb-kube
apiserverproxy role on rb-kube
removing kubecluster from apiserverproxy
...

Verify that the kube cluster was removed.

We repeat the cmsh command:

root@rb-kube:~# cmsh -c 'kubernetes list'
Name (key)        
------------------
default

Step 2: Download the Updated cm-etcd Package

Download the appropriate package from: https://support2.brightcomputing.com/cm-etcd/

Step 3: Verify Package Integrity

Verify the downloaded package using the following MD5 checksums:

BCM Version	Distribution	Package Filename	MD5 Checksum
BCM 10	RHEL 8	`cm-etcd-3.5.25-100101_cm10.0_960146e38f.x86_64.rpm`	`33ecb94d6b16d52dd204432ddbe5b2ac`
BCM 10	Ubuntu 24.04	`cm-etcd_3.5.25-100101-cm10.0-960146e38f_amd64.deb`	`ab9c5bc39f912eb722eab7eb90ddef41`
BCM 10	Ubuntu 20.04	`cm-etcd_3.5.25-100101-cm10.0-960146e38f_amd64.deb`	`7c8e78bd066ff5f8d09b5c541005fa76`
BCM 10	Ubuntu 22.04	`cm-etcd_3.5.25-100101-cm10.0-960146e38f_amd64.deb`	`4cb03d03eb6b0038c1bd77e673cee9e0`
BCM 10	SLES 15	`cm-etcd-3.5.25-100101_cm10.0_960146e38f.x86_64.rpm`	`51980d287d294b50f284c44fc751e488`
BCM 10	RHEL 9	`cm-etcd-3.5.25-100101_cm10.0_960146e38f.x86_64.rpm`	`7768f407db9f52e5b9e1ea7733f2997c`
BCM 11	RHEL 8	`cm-etcd-3.5.25-100104_cm11.0_5e7f36e727.x86_64.rpm`	`644d14e350807f88bbb9f2e8cfefff8c`
BCM 11	Ubuntu 24.04	`cm-etcd_3.5.25-100104-cm11.0-5e7f36e727_amd64.deb`	`45dd303abe3190cf407ec6c0d02995b4`
BCM 11	Ubuntu 22.04	`cm-etcd_3.5.25-100104-cm11.0-5e7f36e727_amd64.deb`	`9fa483e9f6d8b424d454f50755c25ff9`
BCM 11	SLES 15	`cm-etcd-3.5.25-100104_cm11.0_5e7f36e727.x86_64.rpm`	`6efa06afd7d5d9ea217e2ce7c0fa6c21`
BCM 11	RHEL 9	`cm-etcd-3.5.25-100104_cm11.0_5e7f36e727.x86_64.rpm`	`90f7f7d615bf9b151107c1c45c718cc9`

Step 4: Install Package in Software Image

Install the package in the appropriate software image for the etcd nodes before executing the setup.

For example if the Etcd nodes are provisioned from /cm/images/k8s-control-image:

# Example for BCM 11 on Ubuntu 24.04

# first enter the software image chroot
cm-chroot /cm/images/k8s-control-image

# inside it, download and install the appropriate package
wget https://support2.brightcomputing.com/cm-etcd/bcm11/ubuntu2404/cm-etcd_3.5.25-100104-cm11.0-5e7f36e727_amd64.deb
apt install ./cm-etcd_3.5.25-100104-cm11.0-5e7f36e727_amd64.deb

# exit the chroot
exit

Step 5: Proceed with Setup

After installing the updated package, proceed or redo with the Kubernetes setup as normal.

Scenario 2: Upgrading Existing Kubernetes Clusters

For existing clusters that need to be upgraded to Kubernetes >= 1.31, the etcd version must be updated to meet the minimum requirements.

Solution

Follow the same process as Scenario 1 to obtain and verify the updated cm-etcd package. However, for existing clusters, use a rolling update approach to minimize disruption.

Step 1: Update Software Image

Install the new cm-etcd package in the relevant software images as described in Scenario 1.

Step 2: Perform Rolling Update

Instead of updating all etcd nodes simultaneously, update them one at a time:

Update First etcd Node:

# On the etcd node, BCM 11 Ubuntu 24.04
wget https://support2.brightcomputing.com/cm-etcd/bcm11/ubuntu2404/cm-etcd_3.5.25-100104-cm11.0-5e7f36e727_amd64.deb
apt install ./cm-etcd_3.5.25-100104-cm11.0-5e7f36e727_amd64.deb

Restart etcd Service:

# Note: Package installation may have already prompted for automatic restart, in that case can be skipped
systemctl restart etcd

Verify etcd Version and Health:

module load etcd && etcdctl endpoint status --cluster --write-out=table

Expected output showing mixed versions during rolling update:

+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|        ENDPOINT         |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://10.141.0.1:2379 | 4a336cbcb0bafdc0 |  3.5.25 |  249 MB |     false |      false |      8623 |    4355787 |            4355782 |        |
| https://10.141.0.2:2379 | 10cee25dc156ff4a |  3.5.22 |  280 MB |      true |      false |      8623 |    4355778 |            4355778 |        |
| https://10.141.0.3:2379 | bd786940e5446229 |  3.5.22 |  252 MB |     false |      false |      8623 |    4355788 |            4355772 |        |
+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

Verify Cluster Health:

etcdctl -w table endpoint --cluster health

Expected output:

+-------------------------+--------+--------------+-------+
|        ENDPOINT         | HEALTH |     TOOK     | ERROR |
+-------------------------+--------+--------------+-------+
| https://10.141.0.2:2379 |   true |  20.617606ms |       |
| https://10.141.0.3:2379 |   true |  23.436572ms |       |
| https://10.141.0.1:2379 |   true | 327.724023ms |       |
+-------------------------+--------+--------------+-------+

Repeat for Remaining Nodes: Continue only if all nodes report healthy status. Repeat steps 1-4 for each remaining etcd node.

Step 3: Proceed with Kubernetes Upgrade

Once all etcd nodes are running version 3.5.24 or later, proceed with the Kubernetes upgrade.

cm-etcd-upgrade-for-k8s-132-133-134-135.md

BCM 10/11 - etcd Upgrade Required for Kubernetes >= 1.31

Overview

Prerequisites

Background

Relevant Documentation

Scenario 1: New Kubernetes Setup via cm-kubernetes-setup

Symptom

Solution

Step 1: Remove the failed Kubernetes setup first if needed.

Install the cleanup script on the head node

Determine which Kube cluster should be removed

Verify that the kube cluster was removed.

Step 2: Download the Updated cm-etcd Package

Step 3: Verify Package Integrity

Step 4: Install Package in Software Image

Step 5: Proceed with Setup

Scenario 2: Upgrading Existing Kubernetes Clusters

Solution

Step 1: Update Software Image

Step 2: Perform Rolling Update

Step 3: Proceed with Kubernetes Upgrade