如果上述问题自行无法排查,可以按照如下方式提供相关信息进行提问:
1.
提问问题时提供以下信息:
k8s
安装方式
Kubeadm
或二进制
k8s
版本
比如
1.19.0
服务器系统版本
比如
CentOS 8.0
服务器内核版本
比如
4.18
虚拟机还是物理机
虚拟化平台是什么
系统日志
tail-f /var/log/messages
,需要自行检索
error
日志,最好开两个窗口,一个用于重启异常的组件,一个用于看日志
异常容器日志
kubectl logs -f xxxxx -n xxxxxx
,一般报错信息在最上面
2.
详细描述自己的问题,并说明前因后果
比如:我安装了集群,然后使用
create-f
创建了
metrics
的资源,但是发现
metrics server
一直在重启,是使用
logs -f
没有看到异常日志。
3.
自己尝试的解决办法
描述一下自己的尝试过的解决方案,如果没有尝试过,请不要在群里提问。
1 问题描述
我按照二进制安装步骤一直执行到
metric server
章节。
calico
的
pod
无法正常运行检查
已经在
calico-etcd.yaml
中增加了如下内容来选择网卡
,
问题依旧。
- name:IP_AUTODETECTION_METHOD value:
interface=ens160
[root@k8s-master01calico]# k logs -f calico-kube-controllers-cdd5755b9-jldd7 -n kube-system
2021-09-1005:57:37.942 [INFO][1] main.go 88: Loaded configuration from environment
config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1,
ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"",
DatastoreType:"etcdv3"}
I091005:57:37.943785 1 client.go:360]
parsed scheme: "endpoint"
I091005:57:37.943853 1 endpoint.go:68]
ccResolverWrapper: sending new addresses to cc: [{https://192.168.1.121:2379
0 <nil>} {https://192.168.1.122:2379
0 <nil>}
{https://192.168.1.123:2379 0
<nil>}]
W091005:57:37.943935 1
client_config.go:543] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2021-09-1005:57:37.947 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
W091005:57:37.957741 1
clientconn.go:1120] grpc: addrConn.createTransport failed to connect to
{https://192.168.1.122:2379 0
<nil>}. Err :connection error: desc = "transport: authentication
handshake failed: x509: certificate signed by unknown authority (possibly
because of \"crypto/rsa: verification error\" while trying to verify
candidate authority certificate \"etcd\")". Reconnecting...
……
W091005:57:43.411781 1 clientconn.go:1120]
grpc: addrConn.createTransport failed to connect to {https://192.168.1.123:2379
0 <nil>}. Err :connection error:
desc = "transport: authentication handshake failed: x509: certificate
signed by unknown authority (possibly because of \"crypto/rsa:
verification error\" while trying to verify candidate authority
certificate \"etcd\")". Reconnecting...
W091005:57:43.479821 1
clientconn.go:1120] grpc: addrConn.createTransport failed to connect to
{https://192.168.1.122:2379 0
<nil>}. Err :connection error: desc = "transport:
authentication handshake failed: x509: certificate signed by unknown authority
(possibly because of \"crypto/rsa: verification error\" while trying
to verify candidate authority certificate \"etcd\")".
Reconnecting...
W091005:57:43.750584 1
clientconn.go:1120] grpc: addrConn.createTransport failed to connect to
{https://192.168.1.121:2379 0
<nil>}. Err :connection error: desc = "transport:
authentication handshake failed: x509: certificate signed by unknown authority
(possibly because of \"crypto/rsa: verification error\" while trying
to verify candidate authority certificate \"etcd\")".
Reconnecting...
W091005:57:46.986261 1
clientconn.go:1120] grpc: addrConn.createTransport failed to connect to
{https://192.168.1.123:2379 0
<nil>}. Err :connection error: desc = "transport:
authentication handshake failed: x509: certificate signed by unknown authority
(possibly because of \"crypto/rsa: verification error\" while trying
to verify candidate authority certificate \"etcd\")".
Reconnecting...
W091005:57:47.439900 1
clientconn.go:1120] grpc: addrConn.createTransport failed to connect to
{https://192.168.1.121:2379 0
<nil>}. Err :connection error: desc = "transport:
authentication handshake failed: x509: certificate signed by unknown authority
(possibly because of \"crypto/rsa: verification error\" while trying
to verify candidate authority certificate \"etcd\")".
Reconnecting...
W091005:57:47.515303 1
clientconn.go:1120] grpc: addrConn.createTransport failed to connect to
{https://192.168.1.122:2379 0
<nil>}. Err :connection error: desc = "transport:
authentication handshake failed: x509: certificate signed by unknown authority
(possibly because of \"crypto/rsa: verification error\" while trying
to verify candidate authority certificate \"etcd\")".
Reconnecting...
{"level":"warn","ts":"2021-09-10T05:57:47.948Z","caller":"clientv3/retry_interceptor.go:62","msg":"retryingof unary invoker failed","target":"endpoint://client-7da517f6-373e-4568-a8d6-6effe3ff783e/192.168.1.121:2379","attempt":0,"error":"rpc
error: code = DeadlineExceeded desc = latest connection error: connection
error: desc = \"transport: authentication handshake failed: x509:
certificate signed by unknown authority (possibly because of \\\"crypto/rsa:
verification error\\\" while trying to verify candidate authority
certificate \\\"etcd\\\")\""}
2021-09-1005:57:47.948 [ERROR][1] client.go 261: Error getting cluster information config
ClusterInformation="default" error=context deadline exceeded
2021-09-1005:57:47.948 [FATAL][1] main.go 114: Failed to initialize Calico datastore
error=context deadline exceeded
[root@k8s-master01calico]# k describe pod calico-kube-controllers-cdd5755b9-jldd7 -n kube-system
Name: calico-kube-controllers-cdd5755b9-jldd7
Namespace: kube-system
Priority: 2000000000
PriorityClass Name: system-cluster-critical
Node: k8s-node02/192.168.1.125
StartTime: Fri, 10 Sep 2021 13:56:24
+0800
Labels: k8s-app=calico-kube-controllers
pod-template-hash=cdd5755b9
Annotations: <none>
Status: Running
IP: 192.168.1.125
IPs:
IP: 192.168.1.125
ControlledBy: ReplicaSet/calico-kube-controllers-cdd5755b9
Containers:
calico-kube-controllers:
Container ID: docker://ab3d09c8c85e8937f2501e6ea70f3ad3f087a7c0ab9f73d3832340d79a3a1d46
Image: registry.cn-beijing.aliyuncs.com/dotbalo/kube-controllers:v3.15.3
Image ID: docker-pullable://registry.cn-beijing.aliyuncs.com/dotbalo/kube-controllers@sha256:e59cc3287cb44ef835ef78e51b3835eabcddf8b16239a4d78abba5bb8260281c
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 10 Sep 2021 14:08:22 +0800
Finished: Fri, 10 Sep 2021 14:08:32 +0800
Ready: False
Restart Count: 7
Readiness: exec [/usr/bin/check-status -r] delay=0stimeout=1s period=10s #success=1 #failure=3
Environment:
ETCD_ENDPOINTS: <set to the key 'etcd_endpoints' ofconfig map 'calico-config'> Optional:
false
ETCD_CA_CERT_FILE: <set to the key 'etcd_ca' of config map'calico-config'> Optional:
false
ETCD_KEY_FILE: <set to the key 'etcd_key' of configmap 'calico-config'> Optional:
false
ETCD_CERT_FILE: <set to the key 'etcd_cert' of configmap 'calico-config'> Optional:
false
ENABLED_CONTROLLERS: policy,namespace,serviceaccount,workloadendpoint,node
Mounts:
/calico-secrets from etcd-certs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5pqcv
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
etcd-certs:
Type: Secret (a volume populated by a Secret)
SecretName: calico-etcd-secrets
Optional: false
kube-api-access-5pqcv:
Type: Projected (a volume thatcontains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoSClass: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecuteop=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 15m default-scheduler Successfully assigned
kube-system/calico-kube-controllers-cdd5755b9-jldd7 to k8s-node02
Normal Pulled 14m (x4 over 15m) kubelet Container image
"registry.cn-beijing.aliyuncs.com/dotbalo/kube-controllers:v3.15.3"
already present on machine
Normal Created 14m (x4 over 15m) kubelet Created container
calico-kube-controllers
Normal Started 14m (x4 over 15m) kubelet Started container
calico-kube-controllers
Warning Unhealthy 14m (x8 over 15m) kubelet Readiness probe failed: Failed to
read status file status.json: open status.json: no such file or directory
Warning BackOff 34s (x64 over 15m) kubelet Back-off restarting failed
container
2 二进制方式全新安装集群
2.1 kubernetes
版本
[root@k8s-master01selinux]# kubectl version
ClientVersion: version.Info{Major:"1", Minor:"21",
GitVersion:"v1.21.3",
GitCommit:"ca643a4d1f7bfe34773c74f79527be4afd95bf39",
GitTreeState:"clean", BuildDate:"2021-07-15T21:04:39Z", GoVersion:"go1.16.6",
Compiler:"gc", Platform:"linux/amd64"}
ServerVersion: version.Info{Major:"1", Minor:"21",
GitVersion:"
v1.21.3"
,
GitCommit:"ca643a4d1f7bfe34773c74f79527be4afd95bf39",GitTreeState:"clean", BuildDate:"2021-07-15T20:59:07Z",
GoVersion:"go1.16.6", Compiler:"gc",
Platform:"linux/amd64"}
2.2 docker
版本
[root@k8s-master01selinux]# docker version
Client:Docker Engine - Community
Version: 20.10.8
API version: 1.41
Go version: go1.16.6
Git commit: 3967b7d
Built: Fri Jul 30 19:55:49 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server:Docker Engine - Community
Engine:
Version: 20.10.8
API version: 1.41 (minimum version 1.12)
Go version: go1.16.6
Git commit: 75249d8
Built: Fri Jul 30 19:54:13 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.9
GitCommit: e25210fe30a0a703442421b0f60afac609f950a3
runc:
Version: 1.0.1
GitCommit: v1.0.1-0-g4144b63
docker-init:
Version: 0.19.0
GitCommit: de40ad0
2.3
服务器系统版本
centOS7.9
2.4
内核版本
[root@k8s-master01selinux]# uname -a
Linuxk8s-master01
4.19.12
-1.el7.elrepo.x86_64 #1SMP Fri Dec 21 11:06:36 EST 2018 x86_64 x86_64 x86_64 GNU/Linux
2.5
平台
VMware esxi6.0
一共
6
个虚拟机,非克隆
2.6
网段划分
2.6.1 主机网段:
192.168.1.121 - 126
,
apiserver
的
IP
地址是
192.168.1.120
2.6.2 pod网段
K8s Pod
网段:
172.16.0.0/12
2.6.3 service网段
K8s Service
网段:
10.10.0.0/16
2.7
组件启动状态
2.7.1 etcd
[root@k8s-master01~]# export ETCDCTL_API=3
[root@k8s-master01~]# etcdctl --endpoints="192.168.1.123:2379,192.168.1.122:2379,192.168.1.121:2379"
--cacert=/etc/kubernetes/pki/etcd/etcd-ca.pem
--cert=/etc/kubernetes/pki/etcd/etcd.pem
--key=/etc/kubernetes/pki/etcd/etcd-key.pem
endpoint status --write-out=table
+--------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS
LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+--------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|192.168.1.123:2379 | 43697987d6800e1
| 3.4.13 | 2.7 MB |
true | false | 1053 | 33548 | 33548 | |
|192.168.1.122:2379 | 17de01a4b21e79c5 |
3.4.13 | 2.7 MB | false | false | 1053 | 33548 | 33548 | |
|192.168.1.121:2379 | 74ae75b016cc5110 |
3.4.13 | 2.7 MB | false |
false | 1053 | 33548 | 33548 | |
+--------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
2.7.2 kubelet
[root@k8s-master01calico]# systemctl status kubelet -l
●
kubelet.service -Kubernetes Kubelet
Loaded: loaded(/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In:/etc/systemd/system/kubelet.service.d
└─10-kubelet.conf
Active: active (running) since Fri 2021-09-10 10:37:09 CST;2h 11min ago
Docs:https://github.com/kubernetes/kubernetes
Main PID: 1492 (kubelet)
Tasks: 28
Memory: 182.2M
CGroup: /system.slice/kubelet.service
├─ 1492 /usr/local/bin/kubelet--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.kubeconfig
--kubeconfig=/etc/kubernetes/kubelet.kubeconfig
--config=/etc/kubernetes/kubelet-conf.yml
--pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.4.1
--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin
--node-labels=node.kubernetes.io/node=
--tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
--image-pull-progress-deadline=30m
├─22843 /opt/cni/bin/calico
└─22878 /opt/cni/bin/calico
Sep10 12:48:05 k8s-master01 kubelet[1492]: E0910 12:48:05.671381 1492 cadvisor_stats_provider.go:151]
"Unable to fetch pod etc hosts stats" err="failed to get stats
failed command 'du' ($ nice -n 19 du -x -s -B 1) on path
/var/lib/kubelet/pods/717c76ea-76f4-4955-aae1-b81523fc6833/etc-hosts with error
exit status 1" pod="kube-system/metrics-server-64c6c494dc-cjlww"
Sep10 12:48:05 k8s-master01 kubelet[1492]: E0910 12:48:05.675672 1492 cadvisor_stats_provider.go:151]
"Unable to fetch pod etc hosts stats" err="failed to get stats
failed command 'du' ($ nice -n 19 du -x -s -B 1) on path
/var/lib/kubelet/pods/7514f3ad-79e9-42de-8fc7-d3c66ae63586/etc-hosts with error
exit status 1" pod="kube-system/coredns-684d86ff88-v2x7v"
Sep10 12:48:12 k8s-master01 kubelet[1492]: I0910 12:48:12.818836 1492 scope.go:111]
"RemoveContainer"
containerID="48628d9c91bcdd9f6ed48a7db3e5dd744dccf0512242d06b58e63fb9ef4b2d4e"
Sep10 12:48:12 k8s-master01 kubelet[1492]: E0910 12:48:12.819844 1492 pod_workers.go:190] "Error
syncing pod, skipping" err="failed to \"StartContainer\"
for \"calico-node\" with CrashLoopBackOff: \"back-off 5m0s
restarting failed container=calico-node pod=calico-node-ssqbk_kube-system(b3e3dc8e-10bc-42ea-a5cf-04da5cc50a6b)\""
pod="kube-system/calico-node-ssqbk"
podUID=b3e3dc8e-10bc-42ea-a5cf-04da5cc50a6b
Sep10 12:48:15 k8s-master01 kubelet[1492]: E0910 12:48:15.717323 1492 cadvisor_stats_provider.go:151]
"Unable to fetch pod etc hosts stats" err="failed to get stats
failed command 'du' ($ nice -n 19 du -x -s -B 1) on path
/var/lib/kubelet/pods/7514f3ad-79e9-42de-8fc7-d3c66ae63586/etc-hosts with error
exit status 1" pod="kube-system/coredns-684d86ff88-v2x7v"
Sep10 12:48:15 k8s-master01 kubelet[1492]: E0910 12:48:15.721152 1492 cadvisor_stats_provider.go:151]
"Unable to fetch pod etc hosts stats" err="failed to get stats
failed command 'du' ($ nice -n 19 du -x -s -B 1) on path
/var/lib/kubelet/pods/717c76ea-76f4-4955-aae1-b81523fc6833/etc-hosts with error
exit status 1" pod="kube-system/metrics-server-64c6c494dc-cjlww"
Sep10 12:48:23 k8s-master01 kubelet[1492]: I0910 12:48:23.821023 1492 scope.go:111]
"RemoveContainer"
containerID="48628d9c91bcdd9f6ed48a7db3e5dd744dccf0512242d06b58e63fb9ef4b2d4e"
Sep10 12:48:23 k8s-master01 kubelet[1492]: E0910 12:48:23.822048 1492 pod_workers.go:190] "Error
syncing pod, skipping" err="failed to \"StartContainer\"
for \"calico-node\" with CrashLoopBackOff: \"back-off 5m0s
restarting failed container=calico-node
pod=calico-node-ssqbk_kube-system(b3e3dc8e-10bc-42ea-a5cf-04da5cc50a6b)\""
pod="kube-system/calico-node-ssqbk"
podUID=b3e3dc8e-10bc-42ea-a5cf-04da5cc50a6b
Sep10 12:48:25 k8s-master01 kubelet[1492]: E0910 12:48:25.747377 1492 cadvisor_stats_provider.go:151]
"Unable to fetch pod etc hosts stats" err="failed to get stats
failed command 'du' ($ nice -n 19 du -x -s -B 1) on path
/var/lib/kubelet/pods/7514f3ad-79e9-42de-8fc7-d3c66ae63586/etc-hosts with error
exit status 1" pod="kube-system/coredns-684d86ff88-v2x7v"
Sep10 12:48:25 k8s-master01 kubelet[1492]: E0910 12:48:25.752155 1492 cadvisor_stats_provider.go:151]
"Unable to fetch pod etc hosts stats" err="failed to get stats
failed command 'du' ($ nice -n 19 du -x -s -B 1) on path
/var/lib/kubelet/pods/717c76ea-76f4-4955-aae1-b81523fc6833/etc-hosts with error
exit status 1" pod="kube-system/metrics-server-64c6c494dc-cjlww"
除
master01
外所有节点的
kubelet
服务存在如下报错
[root@k8s-master02~]# systemctl status kubelet -l
●
kubelet.service - KubernetesKubelet
Loaded: loaded(/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In:/etc/systemd/system/kubelet.service.d
└─10-kubelet.conf
Active: active (running) since Fri 2021-09-10 10:37:01 CST;2h 12min ago
Docs:https://github.com/kubernetes/kubernetes
Main PID: 1571 (kubelet)
Tasks: 15
Memory: 162.6M
CGroup: /system.slice/kubelet.service
└─1571 /usr/local/bin/kubelet--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.kubeconfig
--kubeconfig=/etc/kubernetes/kubelet.kubeconfig
--config=/etc/kubernetes/kubelet-conf.yml
--pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.4.1
--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin
--node-labels=node.kubernetes.io/node=
--tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
--image-pull-progress-deadline=30m
Sep10 12:48:45 k8s-master02 kubelet[1571]: I0910 12:48:45.437795 1571 scope.go:111]
"RemoveContainer"
containerID="87f8e22822a7b98c896d49604f854e239583fe3412e104aaff15d0a14781c3ac"
Sep10 12:48:45 k8s-master02 kubelet[1571]: E0910 12:48:45.438864 1571 pod_workers.go:190] "Error
syncing pod, skipping" err="failed to \"StartContainer\"
for \"calico-node\" with CrashLoopBackOff: \"back-off 5m0s
restarting failed container=calico-node
pod=calico-node-8dwz2_kube-system(24f3455e-25fc-45da-b7ef-9d4c13538358)\""
pod="kube-system/calico-node-8dwz2" podUID=24f3455e-25fc-45da-b7ef-9d4c13538358
Sep10 12:48:50 k8s-master02 kubelet[1571]: I0910 12:48:50.437436 1571 scope.go:111]
"RemoveContainer"
containerID="d87459075069d32e963986883a2a3de417c99e833f5b51eba4c1fca313e5bec9"
Sep10 12:48:50 k8s-master02 kubelet[1571]: E0910 12:48:50.438035 1571 pod_workers.go:190] "Error
syncing pod, skipping" err="failed to \"StartContainer\"
for \"calico-kube-controllers\" with CrashLoopBackOff: \"back-off
5m0s restarting failed container=calico-kube-controllers pod=calico-kube-controllers-cdd5755b9-48f28_kube-system(802dd17f-50d3-44d3-b945-3588d30b7cda)\""
pod="kube-system/calico-kube-controllers-cdd5755b9-48f28"
podUID=802dd17f-50d3-44d3-b945-3588d30b7cda
Sep10 12:48:57 k8s-master02 kubelet[1571]: I0910 12:48:57.437418 1571
scope.go:111] "RemoveContainer"
containerID="87f8e22822a7b98c896d49604f854e239583fe3412e104aaff15d0a14781c3ac"
Sep10 12:48:57 k8s-master02 kubelet[1571]: E0910 12:48:57.438692 1571 pod_workers.go:190] "Error
syncing pod, skipping" err="failed to \"StartContainer\"
for \"calico-node\" with CrashLoopBackOff: \"back-off 5m0s
restarting failed container=calico-node
pod=calico-node-8dwz2_kube-system(24f3455e-25fc-45da-b7ef-9d4c13538358)\""
pod="kube-system/calico-node-8dwz2" podUID=24f3455e-25fc-45da-b7ef-9d4c13538358
Sep10 12:49:04 k8s-master02 kubelet[1571]: I0910 12:49:04.438012 1571 scope.go:111]
"RemoveContainer"
containerID="d87459075069d32e963986883a2a3de417c99e833f5b51eba4c1fca313e5bec9"
Sep10 12:49:04 k8s-master02 kubelet[1571]: E0910 12:49:04.438570 1571 pod_workers.go:190] "Error
syncing pod, skipping" err="failed to \"StartContainer\"
for \"calico-kube-controllers\" with CrashLoopBackOff:
\"back-off 5m0s restarting failed container=calico-kube-controllers
pod=calico-kube-controllers-cdd5755b9-48f28_kube-system(802dd17f-50d3-44d3-b945-3588d30b7cda)\""
pod="kube-system/calico-kube-controllers-cdd5755b9-48f28"
podUID=802dd17f-50d3-44d3-b945-3588d30b7cda
Sep10 12:49:10 k8s-master02 kubelet[1571]: I0910 12:49:10.439566 1571 scope.go:111]
"RemoveContainer"
containerID="87f8e22822a7b98c896d49604f854e239583fe3412e104aaff15d0a14781c3ac"
Sep10 12:49:10 k8s-master02 kubelet[1571]: E0910 12:49:10.440623 1571 pod_workers.go:190] "Error
syncing pod, skipping" err="failed to \"StartContainer\"
for \"calico-node\" with CrashLoopBackOff: \"back-off 5m0s
restarting failed container=calico-node
pod=calico-node-8dwz2_kube-system(24f3455e-25fc-45da-b7ef-9d4c13538358)\""
pod="kube-system/calico-node-8dwz2" podUID=24f3455e-25fc-45da-b7ef-9d4c13538358
2.7.3 kube-proxy
[root@k8s-master01calico]# systemctl status kube-proxy -l
●
kube-proxy.service -Kubernetes Kube Proxy
Loaded: loaded(/usr/lib/systemd/system/kube-proxy.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2021-09-10 10:36:36 CST;2h 13min ago
Docs:https://github.com/kubernetes/kubernetes
Main PID: 1000 (kube-proxy)
Tasks: 5
Memory: 52.6M
CGroup: /system.slice/kube-proxy.service
└─1000 /usr/local/bin/kube-proxy--config=/etc/kubernetes/kube-proxy.conf --v=2
Sep10 12:48:46 k8s-master01 kube-proxy[1000]: I0910 12:48:46.522496 1000 proxier.go:1034] Not syncing ipvs
rules until Services and Endpoints have been received from master
Sep10 12:48:46 k8s-master01 kube-proxy[1000]: I0910 12:48:46.524785 1000 proxier.go:1034] Not syncing ipvs
rules until Services and Endpoints have been received from master
Sep10 12:48:53 k8s-master01 kube-proxy[1000]: E0910 12:48:53.480329 1000 reflector.go:138]
k8s.io/client-go/informers/factory.go:134: Failed to watch
*v1beta1.EndpointSlice: failed to list *v1beta1.EndpointSlice: Unauthorized
Sep10 12:49:16 k8s-master01 kube-proxy[1000]: I0910 12:49:16.523566 1000 proxier.go:1034] Not syncing ipvs
rules until Services and Endpoints have been received from master
Sep10 12:49:16 k8s-master01 kube-proxy[1000]: I0910 12:49:16.525915 1000 proxier.go:1034] Not syncing ipvs
rules until Services and Endpoints have been received from master
Sep 10 12:49:16 k8s-master01 kube-proxy[1000]:E0910 12:49:16.870887 1000
reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch
*v1.Service: failed to list *v1.Service: Unauthorized
Sep10 12:49:39 k8s-master01 kube-proxy[1000]: E0910 12:49:39.660888 1000 reflector.go:138]
k8s.io/client-go/informers/factory.go:134: Failed to watch
*v1beta1.EndpointSlice: failed to list *v1beta1.EndpointSlice: Unauthorized
Sep10 12:49:46 k8s-master01 kube-proxy[1000]: I0910 12:49:46.523929 1000 proxier.go:1034] Not syncing ipvs
rules until Services and Endpoints have been received from master
Sep10 12:49:46 k8s-master01 kube-proxy[1000]: I0910 12:49:46.526263 1000 proxier.go:1034] Not syncing ipvs
rules until Services and Endpoints have been received from master
Sep 1012:50:08 k8s-master01 kube-proxy[1000]: E0910 12:50:08.267885 1000 reflector.go:138]
k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Service: failed
to list *v1.Service: Unauthorized
2.7.4 kube-apiserver
kube-apiserver
里重复出现如下红色字体的报错
[root@k8s-master01calico]# systemctl status kube-apiserver -l
●
kube-apiserver.service- Kubernetes API Server
Loaded: loaded(/usr/lib/systemd/system/kube-apiserver.service; enabled; vendor preset:
disabled)
Active: active (running) since Fri 2021-09-10 10:36:36 CST;2h 14min ago
Docs:https://github.com/kubernetes/kubernetes
Main PID: 999 (kube-apiserver)
Tasks: 8
Memory: 432.8M
CGroup: /system.slice/kube-apiserver.service
└─999 /usr/local/bin/kube-apiserver--v=2 --logtostderr=true --allow-privileged=true --bind-address=0.0.0.0
--secure-port=6443 --insecure-port=0 --advertise-address=192.168.1.121
--service-cluster-ip-range=10.10.0.0/16 --service-node-port-range=30000-32767
--etcd-servers=https://192.168.1.121:2379,https://192.168.1.122:2379,https://192.168.1.123:2379
--etcd-cafile=/etc/etcd/ssl/etcd-ca.pem --etcd-certfile=/etc/etcd/ssl/etcd.pem
--etcd-keyfile=/etc/etcd/ssl/etcd-key.pem
--client-ca-file=/etc/kubernetes/pki/ca.pem
--tls-cert-file=/etc/kubernetes/pki/apiserver.pem --tls-private-key-file=/etc/kubernetes/pki/apiserver-key.pem
--kubelet-client-certificate=/etc/kubernetes/pki/apiserver.pem
--kubelet-client-key=/etc/kubernetes/pki/apiserver-key.pem
--service-account-key-file=/etc/kubernetes/pki/sa.pub --service-account-signing-key-file=/etc/kubernetes/pki/sa.key
--service-account-issuer=https://kubernetes.default.svc.cluster.local
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
--enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,ResourceQuota
--authorization-mode=Node,RBAC --enable-bootstrap-token-auth=true
--requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.pem
--proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.pem
--proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client-key.pem
--requestheader-allowed-names=aggregator
--requestheader-group-headers=X-Remote-Group
--requestheader-extra-headers-prefix=X-Remote-Extra-
--requestheader-username-headers=X-Remote-Usera
--feature-gates=EphemeralContainers=true
Sep10 12:50:27 k8s-master01 kube-apiserver[999]: I0910 12:50:27.652902 999 clientconn.go:948] ClientConn
switching balancer to "pick_first"
Sep10 12:50:27 k8s-master01 kube-apiserver[999]: I0910 12:50:27.653172 999 balancer_conn_wrappers.go:78]
pickfirstBalancer: HandleSubConnStateChange: 0xc00e506c70, {CONNECTING
<nil>}
Sep10 12:50:27 k8s-master01 kube-apiserver[999]: I0910 12:50:27.678964 999 balancer_conn_wrappers.go:78] pickfirstBalancer:
HandleSubConnStateChange: 0xc00e506c70, {READY <nil>}
Sep10 12:50:27 k8s-master01 kube-apiserver[999]: I0910 12:50:27.681622 999 controlbuf.go:508] transport:
loopyWriter.run returning. connection error: desc = "transport is
closing"
Sep10 12:50:42 k8s-master01 kube-apiserver[999]: I0910 12:50:42.807898 999 client.go:360] parsed scheme:
"passthrough"
Sep10 12:50:42 k8s-master01 kube-apiserver[999]: I0910 12:50:42.807965 999 passthrough.go:48] ccResolverWrapper:
sending update to cc: {[{https://192.168.1.122:2379 <nil> 0 <nil>}] <nil>
<nil>}
Sep10 12:50:42 k8s-master01 kube-apiserver[999]: I0910 12:50:42.807988 999 clientconn.go:948] ClientConn
switching balancer to "pick_first"
Sep10 12:50:42 k8s-master01 kube-apiserver[999]: I0910 12:50:42.808211 999 balancer_conn_wrappers.go:78]
pickfirstBalancer: HandleSubConnStateChange: 0xc00df9a1f0, {CONNECTING
<nil>}
Sep10 12:50:42 k8s-master01 kube-apiserver[999]: I0910 12:50:42.833075 999 balancer_conn_wrappers.go:78]
pickfirstBalancer: HandleSubConnStateChange: 0xc00df9a1f0, {READY <nil>}
Sep10 12:50:42 k8s-master01 kube-apiserver[999]: I0910 12:50:42.834371 999 controlbuf.go:508] transport:
loopyWriter.run returning. connection error: desc = "transport is
closing"
[root@k8s-master01calico]#
2.7.5 kube-controller-manager
[root@k8s-master01calico]# systemctl status kube-controller-manager -l
kube-controller-manager.service - Kubernetes Controller Manager
Loaded: loaded(/usr/lib/systemd/system/kube-controller-manager.service; enabled; vendor
preset: disabled)
Active: active (running) since Fri 2021-09-10 10:36:36 CST;2h 14min ago
Docs:https://github.com/kubernetes/kubernetes
Main PID: 1006 (kube-controller)
Tasks: 6
Memory: 101.1M
CGroup:/system.slice/kube-controller-manager.service
└─1006/usr/local/bin/kube-controller-manager --v=2 --logtostderr=true
--address=127.0.0.1 --root-ca-file=/etc/kubernetes/pki/ca.pem
--cluster-signing-cert-file=/etc/kubernetes/pki/ca.pem --cluster-signing-key-file=/etc/kubernetes/pki/ca-key.pem
--service-account-private-key-file=/etc/kubernetes/pki/sa.key
--kubeconfig=/etc/kubernetes/controller-manager.kubeconfig --leader-elect=true
--cluster-signing-duration=876000h0m0s --use-service-account-credentials=true
--node-monitor-grace-period=40s --node-monitor-period=5s
--pod-eviction-timeout=2m0s --controllers=*,bootstrapsigner,tokencleaner
--allocate-node-cidrs=true --cluster-cidr=172.16.0.0/12
--requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.pem
--node-cidr-mask-size=24 --feature-gates=EphemeralContainers=true
Sep10 10:37:11 k8s-master01 kube-controller-manager[1006]: W0910
10:37:11.138214 1006
authorization.go:184] No authorization-kubeconfig provided, so
SubjectAccessReview of authorization tokens won't work.
Sep10 10:37:11 k8s-master01 kube-controller-manager[1006]: I0910
10:37:11.138277 1006
controllermanager.go:175] Version: v1.21.3
Sep10 10:37:11 k8s-master01 kube-controller-manager[1006]: I0910
10:37:11.148097 1006 tlsconfig.go:178]
loaded client CA
[0/"request-header::/etc/kubernetes/pki/front-proxy-ca.pem"]:
"kubernetes" [] issuer="<self>" (2021-09-08 07:50:00
+0000 UTC to 2026-09-07 07:50:00 +0000 UTC (now=2021-09-10 02:37:11.148072604
+0000 UTC))
Sep10 10:37:11 k8s-master01 kube-controller-manager[1006]: I0910
10:37:11.148376 1006 tlsconfig.go:200]
loaded serving cert ["Generated self signed cert"]:
"localhost@1631241427" [serving] validServingFor=[127.0.0.1,localhost,localhost]
issuer="localhost-ca@1631241426" (2021-09-10 01:37:03 +0000 UTC to
2022-09-10 01:37:03 +0000 UTC (now=2021-09-10 02:37:11.148361186 +0000 UTC))
Sep10 10:37:11 k8s-master01 kube-controller-manager[1006]: I0910
10:37:11.148619 1006
named_certificates.go:53] loaded SNI cert [0/"self-signed loopback"]:
"apiserver-loopback-client@1631241431" [serving]
validServingFor=[apiserver-loopback-client]
issuer="apiserver-loopback-client-ca@1631241429" (2021-09-10 01:37:07
+0000 UTC to 2022-09-10 01:37:07 +0000 UTC (now=2021-09-10 02:37:11.148602782
+0000 UTC))
Sep10 10:37:11 k8s-master01 kube-controller-manager[1006]: I0910
10:37:11.148671 1006
secure_serving.go:197] Serving securely on [::]:10257
Sep10 10:37:11 k8s-master01 kube-controller-manager[1006]: I0910
10:37:11.148771 1006 tlsconfig.go:240]
Starting DynamicServingCertificateController
Sep10 10:37:11 k8s-master01 kube-controller-manager[1006]: I0910
10:37:11.148840 1006
dynamic_cafile_content.go:167] Starting
request-header::/etc/kubernetes/pki/front-proxy-ca.pem
Sep10 10:37:11 k8s-master01 kube-controller-manager[1006]: I0910
10:37:11.149474 1006
deprecated_insecure_serving.go:53] Serving insecurely on 127.0.0.1:10252
Sep10 10:37:11 k8s-master01 kube-controller-manager[1006]: I0910
10:37:11.149979 1006 leaderelection.go:243]
attempting to acquire leader lease kube-system/kube-controller-manager..
2.8 apiserver
访问测试
从各节点
telnet 192.168.1.120 8443
均成功
[root@k8s-master02etcd]# telnet 192.168.1.120 8443
Trying192.168.1.120...
Connectedto 192.168.1.120.
Escapecharacter is '^]'.
[root@k8s-master02etcd]#
curl https://192.168.1.120:8443/healthz -k
ok[root@k8s-master02 etcd]#
2.9
查看
kubelet
证书下发情况
每个节点的
kubelet
证书下发成功
ok[root@k8s-node02pki]# ls -l /var/lib/kubelet/pki/
total12
-rw-------1 root root 1248 Sep 10 10:37 kubelet-client-2021-09-10-10-37-40.pem
lrwxrwxrwx1 root root 59 Sep 10 10:37
kubelet-client-current.pem ->
/var/lib/kubelet/pki/kubelet-client-2021-09-10-10-37-40.pem
-rw-r--r--1 root root 2266 Sep 8 15:04 kubelet.crt
-rw-------1 root root 1679 Sep 8 15:04 kubelet.key
3 尝试解决
看起来是
etcd
的证书有问题,我试着重新生成一下看看。
3.1
检查各组件配置一致性【问题依旧】
我重新在
master01
上配置了各组件的配置文件和
service
的启动文件,并拷贝至其它节点。
3.2
删除所有已经生成的
etcd
证书文件
,
重新生成证书,并重新创建
calico
【问题依旧】
4 最终解决方法:将calico从3.15升级到3.19,重新生成kube-proxy的证书,删除sa重建,并重启服务
删除原来的
calico
kubectl delete -f calico-etcd.yaml
下载新的
calico
的
yaml
文件到本地
:
https://gitee.com/dukuan/k8s-ha-install/blob/manual-installation-v1.22.x/calico/calico.yaml
修改
calico.yaml
中的
POD_CIDR
这个字段,把这个字段改成你的
POD
网段,然后
create -f
还是老样子。
pod
会因为
liveness
和
readiness
检查不通过,一直重启。
执行
kubectl logs -f calico-node
看起来是
kube-proxy
的认证存在问题,重新生成
kube-proxy
的证书。
我试了下
3.15
的那个
calico-etcd.yaml
还是不行,用
3.19
的就可以。
看样子是重新生成了证书之后
kube-proxy
的
sa
和
clusterrolebinding
要删了重建。
kubectl create serviceaccount kube-proxy -nkube-system
kubectl create clusterrolebindingsystem:kube-proxy --clusterrole
system:node-proxier
--serviceaccount kube-system:kube-proxy
刚刚把这两步创建的
sa
和
clusterrolebinding
删了再建,重启各服务,就可以了