To return expected results, you can:

  • Reduce the number of search terms. Each term you use focuses the search further.
  • Check your spelling. A single misspelled or incorrectly typed term can change your result.
  • Try substituting synonyms for your original terms. For example, instead of searching for "java classes", try "java training"
  • Did you search for an IBM acquired or sold product ? If so, follow the appropriate link below to find the content you need.
  • Problem

    When upgrading to version 11.7.1.4, IBM Information Server Microservices tier installer migrates all Kubernetes cluster nodes from Docker to containerd runtime. For a multi-node environment, this involves resetting and rejoining workers. It has been observed that the worker reset may fail to clean up cni0 network interface on a random basis, leading to further problems in the upgrade flow.

    Multi-replica services, such as Kafka, Solr or Zookeeper, may show pods stuck in ContainerCreating state. When introspecting such pods with "kubectl describe pod POD_NAME", one can see the following or similar error message:
    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "4ae1e49fc763f2062e064cb26cc71607cbc7de2dbdc1b627eb6c57176a7245b1": plugin type="flannel" failed (add): failed to delegate add: failed to set bridge addr: "cni0" already has an IP address different from 10.32.1.1/24
    Furthermore, the upgrade may fail with the following or similar error:
    fatal: [deployment_coordinator]: FAILED! => {"attempts": 150, "changed": false, "cmd": ["kubectl", "rollout", "status", "statefulset", "solr", "-w=false", "-n", "default"], "delta": "0:00:00.155591", "end": "2022-09-26 10:58:04.739568", "rc": 0, "start": "2022-09-26 10:58:04.583977", "stderr": "", "stderr_lines": [], "stdout": "Waiting for 2 pods to be ready...", "stdout_lines": ["Waiting for 2 pods to be ready..."]} ip link delete cni0
    Due to unknown reasons, that command sometimes fails to delete the interface, while at the same time reporting success. This causes microservices tier installer not to catch the error early and to continue with the upgrade process, ultimately causing IP address range conflict in the Flannel service and leading to further errors as shown above.

    Diagnosing The Problem

    The output of "kubectl get nodes -owide" command may show different node versions and/or the container runtime: $ kubectl get nodes -owide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME controlplane.example.com Ready control-plane 3y157d v1.24.2 10.1.1.1 <none> Red Hat Enterprise Linux 3.10.0-1160.76.1.el7.x86_64 containerd://1.6.6 worker1.example.com Ready <none> 27h v1.21.3 10.1.1.2 <none> Red Hat Enterprise Linux 3.10.0-1160.76.1.el7.x86_64 docker://20.10.3 worker2.example.com Ready <none> 27h v1.21.3 10.1.1.3 <none> Red Hat Enterprise Linux 3.10.0-1160.66.1.el7.x86_64 docker://20.10.3

    Resolving The Problem

    The cni0 interface deletion problem can be solved by resetting and rejoining workers once more. To do this:
  • Log in to the Microservices tier control plane operating system terminal as the user who installed Microservices tier.
  • Change your current working directory to the Microservices tier installation directory, e.g.: cd /opt/IBM/UGinstall/ugdockerfiles
  • Reset workers with use of the following command: ./run_playbook.sh playbooks/platform/kubernetes/reset_workers.yaml -y
    Log in to each of the Microservices tier workers operating system terminal as root and verify that neither cni0 nor flannel.1 interface exists by examining the output of either ip a command or ifconfig -a command. In case any of the interfaces still exist, delete them manually using ip link delete command: ip link delete cni0 ip link delete flannel.1
  • Back on the control plane machine, run the following command to rejoin workers: ./run_playbook.sh playbooks/platform/kubernetes/join_workers.yaml -y
    After rejoining workers, verify that all of the nodes use containerd runtime by viewing the output of kubectl get nodes -owide command. In case there are still nodes using Docker runtime, run the following commands on the control plane node to migrate them:
    ./run_playbook.sh playbooks/platform/kubernetes/reset_workers.yaml -y ./run_playbook.sh playbooks/upgrade/uninstall_docker.yaml -y ./run_playbook.sh playbooks/install/install_pkg_repos.yaml -y ./run_playbook.sh playbooks/platform/kubernetes/setup_containerd.yaml --limit=workers -y ./run_playbook.sh playbooks/platform/kubernetes/join_workers.yaml -y Finally, in case the kubectl get nodes -owide command reports version inconsistency for cluster nodes, re-run upgrade.sh script to have the cluster nodes be upgraded.
  • [{"Type":"MASTER","Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSZJPZ","label":"IBM InfoSphere Information Server"},"ARM Category":[{"code":"a8m0z0000001i9oAAA","label":"Microservices Tier and Kubernetes Issues"}],"ARM Case Number":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"11.7.1"}]