To return expected results, you can:
Reduce the number of search terms.
Each term you use focuses the search further.
Check your spelling.
A single misspelled or incorrectly typed term can change your result.
Try substituting synonyms for your original terms.
For example, instead of searching for "java classes", try "java training"
Did you search for an IBM acquired or sold product ?
If so, follow the appropriate link below to find the content you need.
When upgrading to version 11.7.1.4, IBM Information Server Microservices tier installer migrates all Kubernetes cluster nodes from Docker to containerd runtime. For a multi-node environment, this involves resetting and rejoining workers. It has been observed that the worker reset may fail to clean up cni0 network interface on a random basis, leading to further problems in the upgrade flow.
Multi-replica services, such as Kafka, Solr or Zookeeper, may show pods stuck in ContainerCreating state. When introspecting such pods with "kubectl describe pod POD_NAME", one can see the following or similar error message:
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "4ae1e49fc763f2062e064cb26cc71607cbc7de2dbdc1b627eb6c57176a7245b1": plugin type="flannel" failed (add): failed to delegate add: failed to set bridge addr: "cni0" already has an IP address different from 10.32.1.1/24
Furthermore, the upgrade may fail with the following or similar error:
fatal: [deployment_coordinator]: FAILED! => {"attempts": 150, "changed": false, "cmd": ["kubectl", "rollout", "status", "statefulset", "solr", "-w=false", "-n", "default"], "delta": "0:00:00.155591", "end": "2022-09-26 10:58:04.739568", "rc": 0, "start": "2022-09-26 10:58:04.583977", "stderr": "", "stderr_lines": [], "stdout": "Waiting for 2 pods to be ready...", "stdout_lines": ["Waiting for 2 pods to be ready..."]}
ip link delete cni0
Due to unknown reasons, that command sometimes fails to delete the interface, while at the same time reporting success. This causes microservices tier installer not to catch the error early and to continue with the upgrade process, ultimately causing IP address range conflict in the Flannel service and leading to further errors as shown above.
The output of "kubectl get nodes -owide" command may show different node versions and/or the container runtime:
$ kubectl get nodes -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
controlplane.example.com Ready control-plane 3y157d v1.24.2 10.1.1.1 <none> Red Hat Enterprise Linux 3.10.0-1160.76.1.el7.x86_64 containerd://1.6.6
worker1.example.com Ready <none> 27h v1.21.3 10.1.1.2 <none> Red Hat Enterprise Linux 3.10.0-1160.76.1.el7.x86_64 docker://20.10.3
worker2.example.com Ready <none> 27h v1.21.3 10.1.1.3 <none> Red Hat Enterprise Linux 3.10.0-1160.66.1.el7.x86_64 docker://20.10.3
The cni0 interface deletion problem can be solved by resetting and rejoining workers once more. To do this:
Log in to the Microservices tier control plane operating system terminal as the user who installed Microservices tier.
Change your current working directory to the Microservices tier installation directory, e.g.:
cd /opt/IBM/UGinstall/ugdockerfiles
Reset workers with use of the following command:
./run_playbook.sh playbooks/platform/kubernetes/reset_workers.yaml -y
Log in to each of the Microservices tier workers operating system terminal as root and verify that neither
cni0
nor
flannel.1
interface exists by examining the output of either
ip a
command or
ifconfig -a
command. In case any of the interfaces still exist, delete them manually using
ip link delete
command:
ip link delete cni0
ip link delete flannel.1
Back on the control plane machine, run the following command to rejoin workers:
./run_playbook.sh playbooks/platform/kubernetes/join_workers.yaml -y
After rejoining workers, verify that all of the nodes use containerd runtime by viewing the output of
kubectl get nodes -owide
command. In case there are still nodes using Docker runtime, run the following commands on the control plane node to migrate them:
./run_playbook.sh playbooks/platform/kubernetes/reset_workers.yaml -y
./run_playbook.sh playbooks/upgrade/uninstall_docker.yaml -y
./run_playbook.sh playbooks/install/install_pkg_repos.yaml -y
./run_playbook.sh playbooks/platform/kubernetes/setup_containerd.yaml --limit=workers -y
./run_playbook.sh playbooks/platform/kubernetes/join_workers.yaml -y
Finally, in case the
kubectl get nodes -owide
command reports version inconsistency for cluster nodes, re-run
upgrade.sh
script to have the cluster nodes be upgraded.
[{"Type":"MASTER","Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSZJPZ","label":"IBM InfoSphere Information Server"},"ARM Category":[{"code":"a8m0z0000001i9oAAA","label":"Microservices Tier and Kubernetes Issues"}],"ARM Case Number":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"11.7.1"}]