Suite à une maintenance sur le cluster Ceph, l'instance privatesql040.p19 est restée bloquée et a du être redémarrée. L'indisponibilité a eu lieu de 08:23 à 08:53.
---
Due to an intervention on the Ceph cluster, the privatesql040.p19 was frozen and had to be rebooted. The indisponibility was from 08:23 to 08:53.
Update(s):
Date: 2016-09-28 11:20:19 UTC This morning, Ceph team had a polrad changed on one of their storage servers, and they reintegrated it into the farm.
When you add back a server to a farm, Ceph has to check the data integrity of the data it has, in this case, that represents ~24T of data, and files can be fragmented pretty much everywhere.
When a server using a Ceph disk tries to access a file which has not been checked yet, it has to wait, leading to a huge amount of IO Waits on our hosts.
The Ceph team continues their investigations to understand how can they avoid such problems.
Date: 2016-09-28 09:52:32 UTC We still have issues on Ceph. Big latencies expected. We're still investigating.
Posted Sep 28, 2016 - 09:47 UTC
This incident affected: Web Hosting || Datacenter GRA (Cluster002, Cluster003, Cluster006, Cluster007, Cluster011, Cluster012, Cluster013, Cluster014, Cluster015, Cluster017, Cluster020, Cluster021, Cluster023, Cluster024, Cluster025, Cluster026, Cluster027, Cluster028, Cluster029, Cluster030, Cluster031).