OVHcloud Public Cloud Status

Current status
Legend
  • Operational
  • Degraded performance
  • Partial Outage
  • Major Outage
  • Under maintenance
Worker nodes creation
Incident Report for Public Cloud
Resolved
- Incident summary/Récumé de l'incident: Some nodes are unschedulable due to planned update on Kubernetes 11.5.
- Start time/Heure de début: 02-26-2019 6PM UTC
- Impact / Périmètre affecté: Nodes on Kubernetes cluster version 1.11.5
- Impact type / Type d'impact : Nodes management
- Estimated date to recovery / Date de résolution estimé : 02-27-2019 1AM UTC
- Actions undertaken / Actions entreprises : All of our nodes will be update to 11.7 to reduce the load on OpenStack API.
- Affected hosts / Hôtes affectés: Maximum 1 node per cluster.

Details :
New Kubelet version overload OpenStack API.
Few old nodes are unschedulable. New node deployment are blocked until resolution.

Update(s):

Date: 2019-02-26 01:55:35 UTC
Waiting for OpenStack API to stabilize, requests count decreased a lot.

Date: 2019-02-26 01:26:41 UTC
140 nodes updated.

Still the same load on Open Stack API.

Investigating.

Date: 2019-02-26 00:54:00 UTC
Fix deployed on all impacted nodes.
Starting Kubelet gracefully.

Date: 2019-02-26 00:34:53 UTC
Successfully fixed 3 nodes.
Attempting fix on a 35 nodes.
Around 100 impacted nodes are still pending.

Date: 2019-02-25 23:56:36 UTC
Rootcause is identified.
We have started to roll a full update process to 11.7.
Our main goal is the reduction of load on OpenStack API.
Posted Feb 25, 2019 - 23:35 UTC