今日闲来无事手贱,对测试K8s集群执行
yum -y update
之后集群起不来了。。。
kubectl get nodes
E1213 12:23:44.334665 1480 memcache.go:238]
couldn't get current server API group list:
Get "https://172.16.250.100:6443/api?timeout=32s": dial tcp 172.16.250.100:6443:
connect: connection refused
The connection to the server 172.16.250.100:6443
was refused - did you specify the right host or port?
查看selinux关闭的,swap分区也是关闭的,防火墙关闭的,运行的容器为空
root@k8s-master ~]# getenforce
Disabled
[root@k8s-master ~]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
Active: inactive (dead)
Docs: man:firewalld(1)
[root@k8s-master ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
查看kubelet服务日志
[root@k8s-master ~]# journalctl -xefu kubelet
-- Logs begin at Tue 2022-12-13 12:21:55 CST. --
Dec 13 12:22:52 k8s-master systemd[1]: Started kubelet: The Kubernetes Node Agent.
-- Subject: Unit kubelet.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit kubelet.service has finished starting up.
--
-- The start-up result is done.
Dec 13 12:22:55 k8s-master kubelet[1113]: E1213 12:22:55.394853 1113 run.go:74] "command failed"
err="failed to parse kubelet flag: unknown flag: --network-plugin"
Dec 13 12:22:55 k8s-master systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
Dec 13 12:22:55 k8s-master systemd[1]: Unit kubelet.service entered failed state.
Dec 13 12:22:55 k8s-master systemd[1]: kubelet.service failed.
Dec 13 12:23:05 k8s-master systemd[1]: kubelet.service holdoff time over, scheduling restart.
Dec 13 12:23:05 k8s-master systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
-- Subject: Unit kubelet.service has finished shutting down
经过关键字查找,发现是更新之后k8s自动升级到了1.26版本,由于1.21版本之后弃用docker所以导致集群不可用
[root@k8s-master ~]# rpm -qa|grep kube
kubernetes-cni-1.1.1-0.x86_64
kubelet-1.26.0-0.x86_64
kubectl-1.26.0-0.x86_64
kubeadm-1.26.0-0.x86_64
解决办法:
将集群中所有节点降级,把k8s相关服务降级到1.22版本,虽然官方说明1.21之后弃用docker但是,1.22还是可用的
此处为测试环境,生产环境建议严格按照官方要求
[root@k8s-master ~]# yum downgrade kubelet-1.22.0-0.x86_64 \
kubeadm-1.22.0-0.x86_64 \
kubectl-1.22.0-0.x86_64
重载服务,查看服务状态
[root@k8s-master ~]# systemctl daemon-reload
[root@k8s-master ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Tue 2022-12-13 12:33:16 CST; 2min 1s ago
……
Hint: Some lines were ellipsized, use -l to show in full.
安装完成后,经查询集群已全部恢复正常
[root@k8s-master ~]# kubectl get nodes,pods,svc -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node/k8s-master Ready control-plane,master 13d v1.22.0 172.16.250.100 <none> CentOS Linux 7 (Core) 5.4.226-1.el7.elrepo.x86_64 docker://20.10.21
node/k8s-node1 Ready <none> 13d v1.22.0 172.16.250.101 <none> CentOS Linux 7 (Core) 5.4.226-1.el7.elrepo.x86_64 docker://20.10.21
node/k8s-node2 Ready <none> 13d v1.22.0 172.16.250.102 <none> CentOS Linux 7 (Core) 5.4.226-1.el7.elrepo.x86_64 docker://20.10.21
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/nginx-6799fc88d8-2tcbt 1/1 Running 2 13d 10.244.1.4 k8s-node1 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 13d <none>
service/nginx NodePort 10.99.187.196 <none> 80:32226/TCP 13d app=nginx