* Incident on cluster003 summary: all host unavailable on cluster003
* Start time: 2018-01-05 12h11
* Impact: cluster is unreachable, no website is working for this cluster
* Impact type: Service unavailable
* Estimated time to recovery: 1 hour
* Actions undertaken: investigating and try to lower the load on the cluster
Update(s):
Date: 2019-01-05 12:32:11 UTC *Summary : some host are still unavailable but it don't disturb the service anymore
*Time : 2018-01-05 13h15
*Impact : not anymore
*Recovery time : 2018-01-05 13h15
*Action undertaken : still taking of the last faulting hosts, and keep a look on the cluster
Date: 2019-01-05 11:59:28 UTC *Summary : service is progressively coming back online
*Time : 2018-01-05 12h58
*Impact : service partially unavailable
*Estimated time to recovery : 30 minutes
*Actions undertaken : reboot last failing host, restart apache on others
Posted Jan 05, 2019 - 11:35 UTC
This incident affected: Web Hosting || Datacenter GRA (Cluster002, Cluster003, Cluster006, Cluster007, Cluster011, Cluster012, Cluster013, Cluster014, Cluster015, Cluster017, Cluster020, Cluster021, Cluster023, Cluster024, Cluster025, Cluster026, Cluster027, Cluster028, Cluster029, Cluster030, Cluster031).