K8S故障处理:coredns状态异常处理记录

在Kubernetes中,coredns作为一个服务发现的配置中心,K8S中创建的 Service 和 Pod 都会在其中自动生成相应的 DNS 记录,能够用作集群内部的dns解析,作为一个重要组件,coredns一般都采用daemonSet方式部署。

本文记录了几种场景下coredns异常问题。

图片[1]-K8S故障处理:coredns状态异常处理记录-不念博客

1. 服务器重启后coredns异常

服务器重启后,K8S集群内coredns状态为CrashLoopBackOff,coredns版本:1.8.3,查看coredns日志如下

[root@reg ~]# kubectl  logs -f coredns-bbckw   -n kube-system
plugin/forward: no nameservers found

查看/etc/resolv.conf的dns配置,配置里一个 nameserver 都没有

[root@reg ~]# cat /etc/resolv.conf
# Generated by NetworkManager
search localhost

会导致 coredns 无法起来刷日志,查看configmap 的 coredns 的配置里有如下配置

        forward . /etc/resolv.conf {
            max_concurrent 1000
        }

coredns的/etc/resolv.conf是挂载的宿主机的,会读取文件里的

nameserver配置,所有coredns无法启动的原因找到了,临时解决方式就是重新配置/etc/resolv.conf配置

为了防止下次重启再次丢失,彻底解决方式如下

 编辑该文件,在[main]下面加入dns=none

vi /etc/NetworkManager/NetworkManager.conf
[main]
dns=none

2. 服务器重启后coredns异常

场景:/etc/resolv.conf文件丢失

[root@02-3 ~]# kubectl  get pods -owide -n  kube-system
NAME                                  READY   STATUS              RESTARTS   AGE    IP             NODE           NOMINATED NODE   READINESS GATES
coredns-5mftn                         1/1     Running             2          174d   172.27.0.214   10.10.28.10    <none>           <none>
coredns-9bnl8                         1/1     Running             2          62d    172.27.4.31    10.10.28.141   <none>           <none>
coredns-9tmcj                         0/1     ContainerCreating   0          105s   <none>         10.10.28.116   <none>           <none>

查看pod的日志

[root@02-3 ~]# kubectl describe pods coredns-9tmcj -n kube-system
........
Events:
  Type     Reason                  Age                  From               Message
  ----     ------                  ----                 ----               -------
  Normal   Scheduled               2m11s                default-scheduler  Successfully assigned kube-system/coredns-9tmcj to 10.10.28.116
  Warning  FailedCreatePodSandBox  1s (x11 over 2m11s)  kubelet            Failed to create pod sandbox: open /etc/resolv.conf: no such file or directory

检查dns的配置文件

[root@02-3 ~]# ll /etc/resolv.conf
ls: 无法访问/etc/resolv.conf: 没有那个文件或目录

这个报错比较明显,解决方式同场景一

3. 服务器重启后coredns异常

coredns部署后状态在CrashLoopBackOff和Error不停转换

[root@02-3 ~]# kubectl  get pods -owide -n  kube-system
NAME                                  READY   STATUS             RESTARTS   AGE    IP             NODE           NOMINATED NODE   READINESS GATES
coredns-5mftn                         1/1     Running            2          174d   172.27.0.214   10.10.28.10    <none>           <none>
coredns-9bnl8                         1/1     Running            2          62d    172.27.4.31    10.10.28.141   <none>           <none>
coredns-n2knx                         0/1     CrashLoopBackOff   2          37s    172.27.2.167   10.10.28.116   <none>           <none>

查看coredns的pod日志

[root@02-3 ~]# kubectl describe pods coredns-n2knx   -n kube-system
.....
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  2m4s                 default-scheduler  Successfully assigned kube-system/coredns-n2knx to 10.10.28.116
  Warning  BackOff    31s (x10 over 2m1s)  kubelet            Back-off restarting failed container
  Normal   Pulled     17s (x5 over 2m4s)   kubelet            Container image "reg.wps.lan:5000/wps/coredns:1.8.3" already present on machine
  Normal   Created    17s (x5 over 2m4s)   kubelet            Created container coredns
  Normal   Started    17s (x5 over 2m3s)   kubelet            Started container coredns

从日志上看到coredns无法启动,直接查看docker的日志

[root@02-3 ~]# docker ps -a | grep coredns
d8bfd8b06015   3885a5b7f138                                   "/coredns -conf /etc…"   26 seconds ago   Exited (1) 25 seconds ago             k8s_coredns_coredns-n2knx_kube-system_a78e5c88-6776-4a4e-ab48-d380b0f18c93_4
e7368c926cb0   reg.wps.lan:5000/wps/pause:3.3                 "/pause"                 2 minutes ago    Up 2 minutes                          k8s_POD_coredns-n2knx_kube-system_a78e5c88-6776-4a4e-ab48-d380b0f18c93_0
[root@02-3 ~]# 
[root@02-3 ~]# docker logs -f d8bfd8b06015
.:53
[INFO] plugin/reload: Running configuration MD5 = 4fe39d48e613d0d0369935d7aafb5fa6
CoreDNS-1.8.3
linux/amd64, go1.16, 4293992
[FATAL] plugin/loop: Loop (127.0.0.1:59055 -> :53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 184772892781744701.6546771560531965938."

检查dns配置,nameserver的dns配置成了127.0.0.1,修改成正确的dns地址即可

[root@02-3 ~]# cat /etc/resolv.conf 
; generated by /usr/sbin/dhclient-script
search openstacklocal novalocal
nameserver 127.0.0.1
© 版权声明
THE END