- Incident summary/Récumé de l'incident: Some nodes are unschedulable due to planned update on Kubernetes 11.5.
- Start time/Heure de début: 02-26-2019 6PM UTC
- Impact / Périmètre affecté: Nodes on Kubernetes cluster version 1.11.5
- Impact type / Type d'impact : Nodes management
- Estimated date to recovery / Date de résolution estimé : 02-27-2019 1AM UTC
- Actions undertaken / Actions entreprises : All of our nodes will be update to 11.7 to reduce the load on OpenStack API.
- Affected hosts / Hôtes affectés: Maximum 1 node per cluster.
Details :
New Kubelet version overload OpenStack API.
Few old nodes are unschedulable. New node deployment are blocked until resolution.
Update(s):
Date: 2019-02-26 01:55:35 UTC Waiting for OpenStack API to stabilize, requests count decreased a lot.
Date: 2019-02-26 01:26:41 UTC 140 nodes updated.
Still the same load on Open Stack API.
Investigating.
Date: 2019-02-26 00:54:00 UTC Fix deployed on all impacted nodes.
Starting Kubelet gracefully.
Date: 2019-02-26 00:34:53 UTC Successfully fixed 3 nodes.
Attempting fix on a 35 nodes.
Around 100 impacted nodes are still pending.
Date: 2019-02-25 23:56:36 UTC Rootcause is identified.
We have started to roll a full update process to 11.7.
Our main goal is the reduction of load on OpenStack API.