Kubernetes – Fix Pod status stuck at Terminating or ContainerCreating

The issue of Pod status stuck at Terminating or ContainerCreating can be happened in various root-cause. One of the most reason is the ETCD have invalid or outdate list of lease. This will cause a connection refused (randomly) issue when Kubernetes’s pod try to connect Kubernetes API server. To fix that, you need kubectl and permission to exec into ETCD Manager Pod

Find ETCD version of your Kubernetes cluster

kops get cluster --full -o yaml

Looking for etcdCluster configuration, you will found the version of ETCD

Take a note for next step.
For example version: 3.5.9

Find ETCD Manager Main Pod

kubectl -n kube-system get pod | grep etcd-manager-main

List down the pod name
For example etcd-manager-main-i-0054d377e2464e22f

Exec into the ETCD Manager Main Pod

kubectl exec -it -n kube-system etcd-manager-main-i-0054d377e2464e22f -- sh

Prepare ETCD CTL

Once you can executed into the ETCD Manager Main Pod, run the following commands the prepare etcdctl

The value of ETCD_VERSION must be the same value from Find ETCD version step.

ETCD_VERSION=3.5.9
ETCDDIR=/opt/etcd-v$ETCD_VERSION
CERTDIR=/rootfs/srv/kubernetes/kube-apiserver/
alias etcdctl="ETCDCTL_API=3 $ETCDDIR/etcdctl --cacert=$CERTDIR/etcd-ca.crt --cert=$CERTDIR/etcd-client.crt --key=$CERTDIR/etcd-client.key --endpoints=https://127.0.0.1:4001"

List ETCD Member

etcdctl member list

Get current master leases

etcdctl get --prefix --keys-only /registry/masterleases

Delete all master leases

etcdctl del --prefix /registry/masterleases/

This is a fastest way to delete all outdated master leases in ETCD.

But don’t worry, the all active control-plan (master-node) will try to establish connection and register the lease automatically.

Re-check current master leases

etcdctl get --prefix --keys-only /registry/masterleases

At this point, we should have a list of active control-plan in the master leases.