The issue of Pod status stuck at Terminating or ContainerCreating can be happened in various root-cause. One of the most reason is the ETCD have invalid or outdate list of lease. This will cause a connection refused (randomly) issue when Kubernetes’s pod try to connect Kubernetes API server. To fix that, you need kubectl and permission to exec into ETCD Manager Pod
Find ETCD version of your Kubernetes cluster
kops get cluster --full -o yaml
Looking for etcdCluster configuration, you will found the version of ETCD
Take a note for next step.
For example version: 3.5.9
Find ETCD Manager Main Pod
kubectl -n kube-system get pod | grep etcd-manager-main
List down the pod name
For example etcd-manager-main-i-0054d377e2464e22f
Exec into the ETCD Manager Main Pod
kubectl exec -it -n kube-system etcd-manager-main-i-0054d377e2464e22f -- sh
Prepare ETCD CTL
Once you can executed into the ETCD Manager Main Pod, run the following commands the prepare etcdctl
The value of ETCD_VERSION must be the same value from Find ETCD version step.
ETCD_VERSION=3.5.9 ETCDDIR=/opt/etcd-v$ETCD_VERSION CERTDIR=/rootfs/srv/kubernetes/kube-apiserver/ alias etcdctl="ETCDCTL_API=3 $ETCDDIR/etcdctl --cacert=$CERTDIR/etcd-ca.crt --cert=$CERTDIR/etcd-client.crt --key=$CERTDIR/etcd-client.key --endpoints=https://127.0.0.1:4001"
List ETCD Member
etcdctl member list
Get current master leases
etcdctl get --prefix --keys-only /registry/masterleases
Delete all master leases
etcdctl del --prefix /registry/masterleases/
This is a fastest way to delete all outdated master leases in ETCD.
But don’t worry, the all active control-plan (master-node) will try to establish connection and register the lease automatically.
Re-check current master leases
etcdctl get --prefix --keys-only /registry/masterleases
At this point, we should have a list of active control-plan in the master leases.