如果上述问题自行无法排查,可以按照如下方式提供相关信息进行提问:

1. 提问问题时提供以下信息:

k8s 安装方式 Kubeadm 或二进制

k8s 版本 比如 1.19.0

服务器系统版本 比如 CentOS 8.0

服务器内核版本 比如 4.18

虚拟机还是物理机 虚拟化平台是什么

系统日志 tail-f /var/log/messages ,需要自行检索 error 日志,最好开两个窗口,一个用于重启异常的组件,一个用于看日志

异常容器日志 kubectl logs -f xxxxx -n xxxxxx ,一般报错信息在最上面

2. 详细描述自己的问题,并说明前因后果

比如:我安装了集群,然后使用 create-f 创建了 metrics 的资源,但是发现 metrics server 一直在重启,是使用 logs -f 没有看到异常日志。

3. 自己尝试的解决办法

描述一下自己的尝试过的解决方案,如果没有尝试过,请不要在群里提问。

1    问题描述

我按照二进制安装步骤一直执行到 metric server 章节。

calico pod 无法正常运行检查

已经在 calico-etcd.yaml 中增加了如下内容来选择网卡 , 问题依旧。

- name:IP_AUTODETECTION_METHOD          value: interface=ens160

[root@k8s-master01calico]# k logs -f calico-kube-controllers-cdd5755b9-jldd7 -n kube-system

2021-09-1005:57:37.942 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"etcdv3"}

I091005:57:37.943785       1 client.go:360] parsed scheme: "endpoint"

I091005:57:37.943853       1 endpoint.go:68] ccResolverWrapper: sending new addresses to cc: [{https://192.168.1.121:2379 0  <nil>} {https://192.168.1.122:2379 0  <nil>} {https://192.168.1.123:2379 0  <nil>}]

W091005:57:37.943935       1 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.

2021-09-1005:57:37.947 [INFO][1] main.go 109: Ensuring Calico datastore is initialized

W091005:57:37.957741       1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://192.168.1.122:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"etcd\")". Reconnecting...

……

W091005:57:43.411781       1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://192.168.1.123:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"etcd\")". Reconnecting...

W091005:57:43.479821       1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://192.168.1.122:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"etcd\")". Reconnecting...

W091005:57:43.750584       1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://192.168.1.121:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"etcd\")". Reconnecting...

W091005:57:46.986261       1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://192.168.1.123:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"etcd\")". Reconnecting...

W091005:57:47.439900       1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://192.168.1.121:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"etcd\")". Reconnecting...

W091005:57:47.515303       1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://192.168.1.122:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"etcd\")". Reconnecting...

{"level":"warn","ts":"2021-09-10T05:57:47.948Z","caller":"clientv3/retry_interceptor.go:62","msg":"retryingof unary invoker failed","target":"endpoint://client-7da517f6-373e-4568-a8d6-6effe3ff783e/192.168.1.121:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \\\"crypto/rsa: verification error\\\" while trying to verify candidate authority certificate \\\"etcd\\\")\""}

2021-09-1005:57:47.948 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=context deadline exceeded

2021-09-1005:57:47.948 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=context deadline exceeded

[root@k8s-master01calico]# k describe pod calico-kube-controllers-cdd5755b9-jldd7 -n kube-system

Name:                 calico-kube-controllers-cdd5755b9-jldd7

Namespace:            kube-system

Priority:             2000000000

PriorityClass Name:  system-cluster-critical

Node:                 k8s-node02/192.168.1.125

StartTime:           Fri, 10 Sep 2021 13:56:24 +0800

Labels:               k8s-app=calico-kube-controllers

pod-template-hash=cdd5755b9

Annotations:          <none>

Status:               Running

IP:                   192.168.1.125

IPs:

IP:          192.168.1.125

ControlledBy:  ReplicaSet/calico-kube-controllers-cdd5755b9

Containers:

calico-kube-controllers:

Container ID:  docker://ab3d09c8c85e8937f2501e6ea70f3ad3f087a7c0ab9f73d3832340d79a3a1d46

Image:         registry.cn-beijing.aliyuncs.com/dotbalo/kube-controllers:v3.15.3

Image ID:      docker-pullable://registry.cn-beijing.aliyuncs.com/dotbalo/kube-controllers@sha256:e59cc3287cb44ef835ef78e51b3835eabcddf8b16239a4d78abba5bb8260281c

Port:           <none>

Host Port:      <none>

State:          Waiting

Reason:       CrashLoopBackOff

Last State:     Terminated

Reason:       Error

Exit Code:    1

Started:      Fri, 10 Sep 2021 14:08:22 +0800

Finished:     Fri, 10 Sep 2021 14:08:32 +0800

Ready:          False

Restart Count:  7

Readiness:      exec [/usr/bin/check-status -r] delay=0stimeout=1s period=10s #success=1 #failure=3

Environment:

ETCD_ENDPOINTS:       <set to the key 'etcd_endpoints' ofconfig map 'calico-config'>  Optional: false

ETCD_CA_CERT_FILE:    <set to the key 'etcd_ca' of config map'calico-config'>         Optional: false

ETCD_KEY_FILE:        <set to the key 'etcd_key' of configmap 'calico-config'>        Optional: false

ETCD_CERT_FILE:       <set to the key 'etcd_cert' of configmap 'calico-config'>       Optional: false

ENABLED_CONTROLLERS: policy,namespace,serviceaccount,workloadendpoint,node

Mounts:

/calico-secrets from etcd-certs (rw)

/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5pqcv

Conditions:

Type              Status

Initialized       True

Ready             False

ContainersReady   False

PodScheduled      True

Volumes:

etcd-certs:

Type:       Secret (a volume populated by a Secret)

SecretName: calico-etcd-secrets

Optional:   false

kube-api-access-5pqcv:

Type:                    Projected (a volume thatcontains injected data from multiple sources)

TokenExpirationSeconds:  3607

ConfigMapName:           kube-root-ca.crt

ConfigMapOptional:       <nil>

DownwardAPI:             true

QoSClass:                   BestEffort

Node-Selectors:              kubernetes.io/os=linux

Tolerations:                 CriticalAddonsOnly op=Exists

node-role.kubernetes.io/master:NoSchedule

node.kubernetes.io/not-ready:NoExecute op=Exists for 300s

node.kubernetes.io/unreachable:NoExecuteop=Exists for 300s

Events:

Type    Reason     Age                 From               Message

----    ------     ----                ----               -------

Normal  Scheduled  15m                 default-scheduler  Successfully assigned kube-system/calico-kube-controllers-cdd5755b9-jldd7 to k8s-node02

Normal  Pulled     14m (x4 over 15m)   kubelet            Container image "registry.cn-beijing.aliyuncs.com/dotbalo/kube-controllers:v3.15.3" already present on machine

Normal  Created    14m (x4 over 15m)   kubelet            Created container calico-kube-controllers

Normal  Started    14m (x4 over 15m)   kubelet            Started container calico-kube-controllers

Warning Unhealthy  14m (x8 over 15m)   kubelet            Readiness probe failed: Failed to read status file status.json: open status.json: no such file or directory

Warning BackOff    34s (x64 over 15m)  kubelet            Back-off restarting failed container

2    二进制方式全新安装集群

2.1   kubernetes 版本

[root@k8s-master01selinux]# kubectl version

ClientVersion: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.3", GitCommit:"ca643a4d1f7bfe34773c74f79527be4afd95bf39", GitTreeState:"clean", BuildDate:"2021-07-15T21:04:39Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}

ServerVersion: version.Info{Major:"1", Minor:"21", GitVersion:" v1.21.3" , GitCommit:"ca643a4d1f7bfe34773c74f79527be4afd95bf39",GitTreeState:"clean", BuildDate:"2021-07-15T20:59:07Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}

2.2   docker 版本

[root@k8s-master01selinux]# docker version

Client:Docker Engine - Community

Version:           20.10.8

API version:       1.41

Go version:        go1.16.6

Git commit:        3967b7d

Built:            Fri Jul 30 19:55:49 2021

OS/Arch:           linux/amd64

Context:           default

Experimental:      true

Server:Docker Engine - Community

Engine:

Version:          20.10.8

API version:      1.41 (minimum version 1.12)

Go version:      go1.16.6

Git commit:       75249d8

Built:            Fri Jul 30 19:54:13 2021

OS/Arch:          linux/amd64

Experimental:     false

containerd:

Version:          1.4.9

GitCommit:       e25210fe30a0a703442421b0f60afac609f950a3

runc:

Version:          1.0.1

GitCommit:        v1.0.1-0-g4144b63

docker-init:

Version:          0.19.0

GitCommit:        de40ad0

2.3 服务器系统版本

centOS7.9

2.4 内核版本

[root@k8s-master01selinux]# uname -a

Linuxk8s-master01 4.19.12 -1.el7.elrepo.x86_64 #1SMP Fri Dec 21 11:06:36 EST 2018 x86_64 x86_64 x86_64 GNU/Linux

2.5 平台

VMware esxi6.0 一共 6 个虚拟机,非克隆

2.6 网段划分

2.6.1  主机网段:

192.168.1.121 -  126 apiserver IP 地址是 192.168.1.120

2.6.2  pod网段

K8s Pod 网段: 172.16.0.0/12

2.6.3  service网段

K8s Service 网段: 10.10.0.0/16

2.7 组件启动状态

2.7.1  etcd

[root@k8s-master01~]# export ETCDCTL_API=3

[root@k8s-master01~]# etcdctl --endpoints="192.168.1.123:2379,192.168.1.122:2379,192.168.1.121:2379" --cacert=/etc/kubernetes/pki/etcd/etcd-ca.pem --cert=/etc/kubernetes/pki/etcd/etcd.pem --key=/etc/kubernetes/pki/etcd/etcd-key.pem  endpoint status --write-out=table

+--------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

|      ENDPOINT      |       ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |

+--------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

|192.168.1.123:2379 |  43697987d6800e1 |  3.4.13 |  2.7 MB |      true |      false |      1053 |      33548 |              33548 |        |

|192.168.1.122:2379 | 17de01a4b21e79c5 |  3.4.13 |  2.7 MB |     false |      false |      1053 |      33548 |              33548 |        |

|192.168.1.121:2379 | 74ae75b016cc5110 |  3.4.13 |  2.7 MB |     false |      false |      1053 |      33548 |              33548 |        |

+--------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

2.7.2  kubelet

[root@k8s-master01calico]# systemctl status kubelet -l

kubelet.service -Kubernetes Kubelet

Loaded: loaded(/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)

Drop-In:/etc/systemd/system/kubelet.service.d

└─10-kubelet.conf

Active: active (running) since Fri 2021-09-10 10:37:09 CST;2h 11min ago

Docs:https://github.com/kubernetes/kubernetes

Main PID: 1492 (kubelet)

Tasks: 28

Memory: 182.2M

CGroup: /system.slice/kubelet.service

├─ 1492 /usr/local/bin/kubelet--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.kubeconfig --kubeconfig=/etc/kubernetes/kubelet.kubeconfig --config=/etc/kubernetes/kubelet-conf.yml --pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.4.1 --network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin --node-labels=node.kubernetes.io/node= --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 --image-pull-progress-deadline=30m

├─22843 /opt/cni/bin/calico

└─22878 /opt/cni/bin/calico

Sep10 12:48:05 k8s-master01 kubelet[1492]: E0910 12:48:05.671381    1492 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/717c76ea-76f4-4955-aae1-b81523fc6833/etc-hosts with error exit status 1" pod="kube-system/metrics-server-64c6c494dc-cjlww"

Sep10 12:48:05 k8s-master01 kubelet[1492]: E0910 12:48:05.675672    1492 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/7514f3ad-79e9-42de-8fc7-d3c66ae63586/etc-hosts with error exit status 1" pod="kube-system/coredns-684d86ff88-v2x7v"

Sep10 12:48:12 k8s-master01 kubelet[1492]: I0910 12:48:12.818836    1492 scope.go:111] "RemoveContainer" containerID="48628d9c91bcdd9f6ed48a7db3e5dd744dccf0512242d06b58e63fb9ef4b2d4e"

Sep10 12:48:12 k8s-master01 kubelet[1492]: E0910 12:48:12.819844    1492 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"calico-node\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=calico-node pod=calico-node-ssqbk_kube-system(b3e3dc8e-10bc-42ea-a5cf-04da5cc50a6b)\"" pod="kube-system/calico-node-ssqbk" podUID=b3e3dc8e-10bc-42ea-a5cf-04da5cc50a6b

Sep10 12:48:15 k8s-master01 kubelet[1492]: E0910 12:48:15.717323    1492 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/7514f3ad-79e9-42de-8fc7-d3c66ae63586/etc-hosts with error exit status 1" pod="kube-system/coredns-684d86ff88-v2x7v"

Sep10 12:48:15 k8s-master01 kubelet[1492]: E0910 12:48:15.721152    1492 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/717c76ea-76f4-4955-aae1-b81523fc6833/etc-hosts with error exit status 1" pod="kube-system/metrics-server-64c6c494dc-cjlww"

Sep10 12:48:23 k8s-master01 kubelet[1492]: I0910 12:48:23.821023    1492 scope.go:111] "RemoveContainer" containerID="48628d9c91bcdd9f6ed48a7db3e5dd744dccf0512242d06b58e63fb9ef4b2d4e"

Sep10 12:48:23 k8s-master01 kubelet[1492]: E0910 12:48:23.822048    1492 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"calico-node\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=calico-node pod=calico-node-ssqbk_kube-system(b3e3dc8e-10bc-42ea-a5cf-04da5cc50a6b)\"" pod="kube-system/calico-node-ssqbk" podUID=b3e3dc8e-10bc-42ea-a5cf-04da5cc50a6b

Sep10 12:48:25 k8s-master01 kubelet[1492]: E0910 12:48:25.747377    1492 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/7514f3ad-79e9-42de-8fc7-d3c66ae63586/etc-hosts with error exit status 1" pod="kube-system/coredns-684d86ff88-v2x7v"

Sep10 12:48:25 k8s-master01 kubelet[1492]: E0910 12:48:25.752155    1492 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/717c76ea-76f4-4955-aae1-b81523fc6833/etc-hosts with error exit status 1" pod="kube-system/metrics-server-64c6c494dc-cjlww"

master01 外所有节点的 kubelet 服务存在如下报错

[root@k8s-master02~]# systemctl status kubelet -l

kubelet.service - KubernetesKubelet

Loaded: loaded(/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)

Drop-In:/etc/systemd/system/kubelet.service.d

└─10-kubelet.conf

Active: active (running) since Fri 2021-09-10 10:37:01 CST;2h 12min ago

Docs:https://github.com/kubernetes/kubernetes

Main PID: 1571 (kubelet)

Tasks: 15

Memory: 162.6M

CGroup: /system.slice/kubelet.service

└─1571 /usr/local/bin/kubelet--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.kubeconfig --kubeconfig=/etc/kubernetes/kubelet.kubeconfig --config=/etc/kubernetes/kubelet-conf.yml --pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.4.1 --network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin --node-labels=node.kubernetes.io/node= --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 --image-pull-progress-deadline=30m

Sep10 12:48:45 k8s-master02 kubelet[1571]: I0910 12:48:45.437795    1571 scope.go:111] "RemoveContainer" containerID="87f8e22822a7b98c896d49604f854e239583fe3412e104aaff15d0a14781c3ac"

Sep10 12:48:45 k8s-master02 kubelet[1571]: E0910 12:48:45.438864    1571 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"calico-node\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=calico-node pod=calico-node-8dwz2_kube-system(24f3455e-25fc-45da-b7ef-9d4c13538358)\"" pod="kube-system/calico-node-8dwz2" podUID=24f3455e-25fc-45da-b7ef-9d4c13538358

Sep10 12:48:50 k8s-master02 kubelet[1571]: I0910 12:48:50.437436    1571 scope.go:111] "RemoveContainer" containerID="d87459075069d32e963986883a2a3de417c99e833f5b51eba4c1fca313e5bec9"

Sep10 12:48:50 k8s-master02 kubelet[1571]: E0910 12:48:50.438035    1571 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"calico-kube-controllers\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=calico-kube-controllers pod=calico-kube-controllers-cdd5755b9-48f28_kube-system(802dd17f-50d3-44d3-b945-3588d30b7cda)\"" pod="kube-system/calico-kube-controllers-cdd5755b9-48f28" podUID=802dd17f-50d3-44d3-b945-3588d30b7cda

Sep10 12:48:57 k8s-master02 kubelet[1571]: I0910 12:48:57.437418    1571 scope.go:111] "RemoveContainer" containerID="87f8e22822a7b98c896d49604f854e239583fe3412e104aaff15d0a14781c3ac"

Sep10 12:48:57 k8s-master02 kubelet[1571]: E0910 12:48:57.438692    1571 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"calico-node\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=calico-node pod=calico-node-8dwz2_kube-system(24f3455e-25fc-45da-b7ef-9d4c13538358)\"" pod="kube-system/calico-node-8dwz2" podUID=24f3455e-25fc-45da-b7ef-9d4c13538358

Sep10 12:49:04 k8s-master02 kubelet[1571]: I0910 12:49:04.438012    1571 scope.go:111] "RemoveContainer" containerID="d87459075069d32e963986883a2a3de417c99e833f5b51eba4c1fca313e5bec9"

Sep10 12:49:04 k8s-master02 kubelet[1571]: E0910 12:49:04.438570    1571 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"calico-kube-controllers\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=calico-kube-controllers pod=calico-kube-controllers-cdd5755b9-48f28_kube-system(802dd17f-50d3-44d3-b945-3588d30b7cda)\"" pod="kube-system/calico-kube-controllers-cdd5755b9-48f28" podUID=802dd17f-50d3-44d3-b945-3588d30b7cda

Sep10 12:49:10 k8s-master02 kubelet[1571]: I0910 12:49:10.439566    1571 scope.go:111] "RemoveContainer" containerID="87f8e22822a7b98c896d49604f854e239583fe3412e104aaff15d0a14781c3ac"

Sep10 12:49:10 k8s-master02 kubelet[1571]: E0910 12:49:10.440623    1571 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"calico-node\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=calico-node pod=calico-node-8dwz2_kube-system(24f3455e-25fc-45da-b7ef-9d4c13538358)\"" pod="kube-system/calico-node-8dwz2" podUID=24f3455e-25fc-45da-b7ef-9d4c13538358

2.7.3  kube-proxy

[root@k8s-master01calico]# systemctl status kube-proxy -l

kube-proxy.service -Kubernetes Kube Proxy

Loaded: loaded(/usr/lib/systemd/system/kube-proxy.service; enabled; vendor preset: disabled)

Active: active (running) since Fri 2021-09-10 10:36:36 CST;2h 13min ago

Docs:https://github.com/kubernetes/kubernetes

Main PID: 1000 (kube-proxy)

Tasks: 5

Memory: 52.6M

CGroup: /system.slice/kube-proxy.service

└─1000 /usr/local/bin/kube-proxy--config=/etc/kubernetes/kube-proxy.conf --v=2

Sep10 12:48:46 k8s-master01 kube-proxy[1000]: I0910 12:48:46.522496    1000 proxier.go:1034] Not syncing ipvs rules until Services and Endpoints have been received from master

Sep10 12:48:46 k8s-master01 kube-proxy[1000]: I0910 12:48:46.524785    1000 proxier.go:1034] Not syncing ipvs rules until Services and Endpoints have been received from master

Sep10 12:48:53 k8s-master01 kube-proxy[1000]: E0910 12:48:53.480329    1000 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1beta1.EndpointSlice: failed to list *v1beta1.EndpointSlice: Unauthorized

Sep10 12:49:16 k8s-master01 kube-proxy[1000]: I0910 12:49:16.523566    1000 proxier.go:1034] Not syncing ipvs rules until Services and Endpoints have been received from master

Sep10 12:49:16 k8s-master01 kube-proxy[1000]: I0910 12:49:16.525915    1000 proxier.go:1034] Not syncing ipvs rules until Services and Endpoints have been received from master

Sep 10 12:49:16 k8s-master01 kube-proxy[1000]:E0910 12:49:16.870887    1000 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Service: failed to list *v1.Service: Unauthorized

Sep10 12:49:39 k8s-master01 kube-proxy[1000]: E0910 12:49:39.660888    1000 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1beta1.EndpointSlice: failed to list *v1beta1.EndpointSlice: Unauthorized

Sep10 12:49:46 k8s-master01 kube-proxy[1000]: I0910 12:49:46.523929    1000 proxier.go:1034] Not syncing ipvs rules until Services and Endpoints have been received from master

Sep10 12:49:46 k8s-master01 kube-proxy[1000]: I0910 12:49:46.526263    1000 proxier.go:1034] Not syncing ipvs rules until Services and Endpoints have been received from master

Sep 1012:50:08 k8s-master01 kube-proxy[1000]: E0910 12:50:08.267885    1000 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Service: failed to list *v1.Service: Unauthorized

2.7.4  kube-apiserver

kube-apiserver 里重复出现如下红色字体的报错

[root@k8s-master01calico]# systemctl status kube-apiserver -l

kube-apiserver.service- Kubernetes API Server

Loaded: loaded(/usr/lib/systemd/system/kube-apiserver.service; enabled; vendor preset: disabled)

Active: active (running) since Fri 2021-09-10 10:36:36 CST;2h 14min ago

Docs:https://github.com/kubernetes/kubernetes

Main PID: 999 (kube-apiserver)

Tasks: 8

Memory: 432.8M

CGroup: /system.slice/kube-apiserver.service

└─999 /usr/local/bin/kube-apiserver--v=2 --logtostderr=true --allow-privileged=true --bind-address=0.0.0.0 --secure-port=6443 --insecure-port=0 --advertise-address=192.168.1.121 --service-cluster-ip-range=10.10.0.0/16 --service-node-port-range=30000-32767 --etcd-servers=https://192.168.1.121:2379,https://192.168.1.122:2379,https://192.168.1.123:2379 --etcd-cafile=/etc/etcd/ssl/etcd-ca.pem --etcd-certfile=/etc/etcd/ssl/etcd.pem --etcd-keyfile=/etc/etcd/ssl/etcd-key.pem --client-ca-file=/etc/kubernetes/pki/ca.pem --tls-cert-file=/etc/kubernetes/pki/apiserver.pem --tls-private-key-file=/etc/kubernetes/pki/apiserver-key.pem --kubelet-client-certificate=/etc/kubernetes/pki/apiserver.pem --kubelet-client-key=/etc/kubernetes/pki/apiserver-key.pem --service-account-key-file=/etc/kubernetes/pki/sa.pub --service-account-signing-key-file=/etc/kubernetes/pki/sa.key --service-account-issuer=https://kubernetes.default.svc.cluster.local --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,ResourceQuota --authorization-mode=Node,RBAC --enable-bootstrap-token-auth=true --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.pem --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.pem --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client-key.pem --requestheader-allowed-names=aggregator --requestheader-group-headers=X-Remote-Group --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-username-headers=X-Remote-Usera --feature-gates=EphemeralContainers=true

Sep10 12:50:27 k8s-master01 kube-apiserver[999]: I0910 12:50:27.652902     999 clientconn.go:948] ClientConn switching balancer to "pick_first"

Sep10 12:50:27 k8s-master01 kube-apiserver[999]: I0910 12:50:27.653172     999 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc00e506c70, {CONNECTING <nil>}

Sep10 12:50:27 k8s-master01 kube-apiserver[999]: I0910 12:50:27.678964     999 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc00e506c70, {READY <nil>}

Sep10 12:50:27 k8s-master01 kube-apiserver[999]: I0910 12:50:27.681622     999 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"

Sep10 12:50:42 k8s-master01 kube-apiserver[999]: I0910 12:50:42.807898     999 client.go:360] parsed scheme: "passthrough"

Sep10 12:50:42 k8s-master01 kube-apiserver[999]: I0910 12:50:42.807965     999 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{https://192.168.1.122:2379  <nil> 0 <nil>}] <nil> <nil>}

Sep10 12:50:42 k8s-master01 kube-apiserver[999]: I0910 12:50:42.807988     999 clientconn.go:948] ClientConn switching balancer to "pick_first"

Sep10 12:50:42 k8s-master01 kube-apiserver[999]: I0910 12:50:42.808211     999 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc00df9a1f0, {CONNECTING <nil>}

Sep10 12:50:42 k8s-master01 kube-apiserver[999]: I0910 12:50:42.833075     999 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc00df9a1f0, {READY <nil>}

Sep10 12:50:42 k8s-master01 kube-apiserver[999]: I0910 12:50:42.834371     999 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"

[root@k8s-master01calico]#

2.7.5  kube-controller-manager

[root@k8s-master01calico]# systemctl status kube-controller-manager -l kube-controller-manager.service - Kubernetes Controller Manager

Loaded: loaded(/usr/lib/systemd/system/kube-controller-manager.service; enabled; vendor preset: disabled)

Active: active (running) since Fri 2021-09-10 10:36:36 CST;2h 14min ago

Docs:https://github.com/kubernetes/kubernetes

Main PID: 1006 (kube-controller)

Tasks: 6

Memory: 101.1M

CGroup:/system.slice/kube-controller-manager.service

└─1006/usr/local/bin/kube-controller-manager --v=2 --logtostderr=true --address=127.0.0.1 --root-ca-file=/etc/kubernetes/pki/ca.pem --cluster-signing-cert-file=/etc/kubernetes/pki/ca.pem --cluster-signing-key-file=/etc/kubernetes/pki/ca-key.pem --service-account-private-key-file=/etc/kubernetes/pki/sa.key --kubeconfig=/etc/kubernetes/controller-manager.kubeconfig --leader-elect=true --cluster-signing-duration=876000h0m0s --use-service-account-credentials=true --node-monitor-grace-period=40s --node-monitor-period=5s --pod-eviction-timeout=2m0s --controllers=*,bootstrapsigner,tokencleaner --allocate-node-cidrs=true --cluster-cidr=172.16.0.0/12 --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.pem --node-cidr-mask-size=24 --feature-gates=EphemeralContainers=true

Sep10 10:37:11 k8s-master01 kube-controller-manager[1006]: W0910 10:37:11.138214    1006 authorization.go:184] No authorization-kubeconfig provided, so SubjectAccessReview of authorization tokens won't work.

Sep10 10:37:11 k8s-master01 kube-controller-manager[1006]: I0910 10:37:11.138277    1006 controllermanager.go:175] Version: v1.21.3

Sep10 10:37:11 k8s-master01 kube-controller-manager[1006]: I0910 10:37:11.148097    1006 tlsconfig.go:178] loaded client CA [0/"request-header::/etc/kubernetes/pki/front-proxy-ca.pem"]: "kubernetes" [] issuer="<self>" (2021-09-08 07:50:00 +0000 UTC to 2026-09-07 07:50:00 +0000 UTC (now=2021-09-10 02:37:11.148072604 +0000 UTC))

Sep10 10:37:11 k8s-master01 kube-controller-manager[1006]: I0910 10:37:11.148376    1006 tlsconfig.go:200] loaded serving cert ["Generated self signed cert"]: "localhost@1631241427" [serving] validServingFor=[127.0.0.1,localhost,localhost] issuer="localhost-ca@1631241426" (2021-09-10 01:37:03 +0000 UTC to 2022-09-10 01:37:03 +0000 UTC (now=2021-09-10 02:37:11.148361186 +0000 UTC))

Sep10 10:37:11 k8s-master01 kube-controller-manager[1006]: I0910 10:37:11.148619    1006 named_certificates.go:53] loaded SNI cert [0/"self-signed loopback"]: "apiserver-loopback-client@1631241431" [serving] validServingFor=[apiserver-loopback-client] issuer="apiserver-loopback-client-ca@1631241429" (2021-09-10 01:37:07 +0000 UTC to 2022-09-10 01:37:07 +0000 UTC (now=2021-09-10 02:37:11.148602782 +0000 UTC))

Sep10 10:37:11 k8s-master01 kube-controller-manager[1006]: I0910 10:37:11.148671    1006 secure_serving.go:197] Serving securely on [::]:10257

Sep10 10:37:11 k8s-master01 kube-controller-manager[1006]: I0910 10:37:11.148771    1006 tlsconfig.go:240] Starting DynamicServingCertificateController

Sep10 10:37:11 k8s-master01 kube-controller-manager[1006]: I0910 10:37:11.148840    1006 dynamic_cafile_content.go:167] Starting request-header::/etc/kubernetes/pki/front-proxy-ca.pem

Sep10 10:37:11 k8s-master01 kube-controller-manager[1006]: I0910 10:37:11.149474    1006 deprecated_insecure_serving.go:53] Serving insecurely on 127.0.0.1:10252

Sep10 10:37:11 k8s-master01 kube-controller-manager[1006]: I0910 10:37:11.149979    1006 leaderelection.go:243] attempting to acquire leader lease kube-system/kube-controller-manager..

2.8   apiserver 访问测试

从各节点 telnet 192.168.1.120 8443 均成功

[root@k8s-master02etcd]# telnet 192.168.1.120 8443

Trying192.168.1.120...

Connectedto 192.168.1.120.

Escapecharacter is '^]'.

[root@k8s-master02etcd]# curl https://192.168.1.120:8443/healthz -k

ok[root@k8s-master02 etcd]#

2.9 查看 kubelet 证书下发情况

每个节点的 kubelet 证书下发成功

ok[root@k8s-node02pki]# ls -l /var/lib/kubelet/pki/

total12

-rw-------1 root root 1248 Sep 10 10:37 kubelet-client-2021-09-10-10-37-40.pem

lrwxrwxrwx1 root root   59 Sep 10 10:37 kubelet-client-current.pem -> /var/lib/kubelet/pki/kubelet-client-2021-09-10-10-37-40.pem

-rw-r--r--1 root root 2266 Sep  8 15:04 kubelet.crt

-rw-------1 root root 1679 Sep  8 15:04 kubelet.key

3    尝试解决

看起来是 etcd 的证书有问题,我试着重新生成一下看看。

3.1 检查各组件配置一致性【问题依旧】

我重新在 master01 上配置了各组件的配置文件和 service 的启动文件,并拷贝至其它节点。

3.2 删除所有已经生成的 etcd 证书文件 , 重新生成证书,并重新创建 calico 【问题依旧】

4    最终解决方法:将calico从3.15升级到3.19,重新生成kube-proxy的证书,删除sa重建,并重启服务

删除原来的 calico

kubectl delete -f calico-etcd.yaml

下载新的 calico yaml 文件到本地 :

https://gitee.com/dukuan/k8s-ha-install/blob/manual-installation-v1.22.x/calico/calico.yaml

修改 calico.yaml 中的 POD_CIDR 这个字段,把这个字段改成你的 POD 网段,然后 create -f

还是老样子。 pod 会因为 liveness readiness 检查不通过,一直重启。

执行 kubectl logs -f calico-node

看起来是 kube-proxy 的认证存在问题,重新生成 kube-proxy 的证书。

我试了下 3.15 的那个 calico-etcd.yaml 还是不行,用 3.19 的就可以。

看样子是重新生成了证书之后 kube-proxy sa clusterrolebinding 要删了重建。

kubectl create serviceaccount kube-proxy -nkube-system

kubectl create clusterrolebindingsystem:kube-proxy         --clusterrole system:node-proxier         --serviceaccount kube-system:kube-proxy

刚刚把这两步创建的 sa clusterrolebinding 删了再建,重启各服务,就可以了


is.na(x) #返回一个逻辑向量,TRUE为缺失值,FALSE为非缺失值 table(is.na(x)) #统计分类个数 sum(x) #当向量存在缺失值的时候统计结果也是缺失值 sum(x,na.rm = TRUE) #很多函数里都有na.r

java企业编程思想 java编程思想中文版

下载地址:网盘下载 内容简介 编辑 从本书获得的各项大奖以及来自世界各地的读者评论中,不难看出这是一本经典之作。本书的作者拥有多年教学经验,对C、C++以及 Java语言都有独到、深入的见解,以通俗易懂及小而直接的示例解释了一个个晦涩抽象的概念。本书共22章,包括操作符、控制执行流程、访问权限控制、复用类、多态、接口、通过异常处理错误、字符串、泛型、数组、容器深入研究、Java