Network & Infrastructure Status

OVHcloud Network Status

Current status

Legend

Operational
Degraded performance
Partial Outage
Major Outage
Under maintenance

rbx-g5-a9

Incident Report for Network & Infrastructure

Resolved

La carte 4 du routeur a rebooté.

LC/0/4/CPU0:Jun 1 17:26:35 UTC: obflmgr[73]: %DIAG-DIAG-6-INFO : goldxr_OBFL_online_coalescer(L#1274): Coalescing OBFL records for diags
LC/0/4/CPU0:Jun 2 00:16:11 UTC: obflmgr[73]: %DIAG-DIAG-6-INFO : goldxr_OBFL_online_coalescer(L#1274): Coalescing OBFL records for diags
LC/0/4/CPU0:Jun 2 07:05:47 UTC: obflmgr[73]: %DIAG-DIAG-6-INFO : goldxr_OBFL_online_coalescer(L#1274): Coalescing OBFL records for diags
LC/0/4/CPU0:Jun 2 13:55:23 UTC: obflmgr[73]: %DIAG-DIAG-6-INFO : goldxr_OBFL_online_coalescer(L#1274): Coalescing OBFL records for diags
LC/0/4/CPU0:Jun 2 20:44:59 UTC: obflmgr[73]: %DIAG-DIAG-6-INFO : goldxr_OBFL_online_coalescer(L#1274): Coalescing OBFL records for diags
LC/0/4/CPU0:Jun 3 03:34:35 UTC: obflmgr[73]: %DIAG-DIAG-6-INFO : goldxr_OBFL_online_coalescer(L#1274): Coalescing OBFL records for diags
LC/0/4/CPU0:Jun 3 10:24:35 UTC: obflmgr[73]: %DIAG-DIAG-6-INFO : goldxr_OBFL_online_coalescer(L#1274): Coalescing OBFL records for diags
LC/0/4/CPU0:Jun 3 15:06:18 UTC: prm_server_ty[297]: %PLATFORM-NP-3-ECC : prm_ser_check: Single-bit ECC error detected: NP 5, block 0x1d (SMI), offset 2, memid 557, name INT2_MEM, addr 0x00001df9, bit 2147483648, ext info 0xffffffff 0xffffffff 0xffffffff 0xffffffff, action 2 (Reset)
LC/0/4/CPU0:Jun 3 15:06:18 UTC: pfm_node_lc[287]: %PLATFORM-NP-0-NON_RECOVERABLE_SOFT_ERROR : Set|prm_server_ty[168018]|Network Processor Unit(0x1008005)| A non-recoverable soft error has been detected on NP5. The linecard will be rebooted.
LC/0/4/CPU0:Jun 3 15:06:18 UTC: pfm_node_lc[287]: %PLATFORM-PFM-0-CARD_RESET_REQ : pfm_dev_sm_perform_recovery_action, Card reset requested by: Process ID: 168018 (prm_server_ty), Fault Sev: 0, Target node: 0/4/CPU0, CompId: 0x1f, Device Handle: 0x1008005, CondID: 1034, Fault Reason: A non-recoverable soft error has been detected on NP5. The linecard will be rebooted.
LC/0/4/CPU0:Jun 3 15:06:18 UTC: syslog_dev[89]: pfm_node_lc[287]: Request Graceful Reboot via Sysmgr: Reason: pfm_dev_sm_perform_recovery_action, Card reset requested by: Process ID: 168018 (prm_server_ty), Fault Sev: 0, Target node: 0/4/CPU0, CompId: 0x1f, Device Handle: 0x1008005, CondID: 1034, Fault Reason: A non-recoverable soft error has been detected on NP5. The linecard will be rebooted.

Update(s):

Date: 2015-06-08 14:05:30 UTC
Tous les ports sont basculés. Nous enlevons la carte defecteuse.

Date: 2015-06-08 13:44:05 UTC
La carte a correctement booté.
Nous basculons les ports 1 par 1.

Date: 2015-06-08 12:41:44 UTC
Nous venons de recevoir la carte 24*10G.
Nous l'insérons dans un nouveau slot.

Date: 2015-06-03 16:20:31 UTC
le lien est deplacé

RP/0/RSP0/CPU0:rbx-g5-a9#sh inter desc | i rbx2b-101
Wed Jun 3 16:15:16.606 UTC
BE4011 up up rbx2b-101ab-n56-vrack
Te0/4/0/15 up up rbx2b-101a-n56-vrack
Te0/5/0/13 up up rbx2b-101b-n56-vrack

Date: 2015-06-03 16:09:05 UTC
Normalement il n'y a pas d'impacte sauf que le LAG d'un switch
de PCC a été configuré sur la même carte:

RP/0/RSP0/CPU0:rbx-g5-a9#sh inter description | i rbx2b-101
Wed Jun 3 16:01:20.286 UTC
BE4011 up up rbx2b-101ab-n56-vrack
Te0/4/0/15 up up rbx2b-101a-n56-vrack
Te0/4/0/22 up up rbx2b-101b-n56-vrack

On est en train de bouger l'un des ports sur une autre carte.

En parallele, on regarde avec Cisco sur changer la carte.

Date: 2015-06-03 16:07:28 UTC
L'incident a durée 8 minutes.

LC/0/4/CPU0:Jun 3 15:14:06 UTC: ifmgr[200]: %PKT_INFRA-LINEPROTO-5-UPDOWN : Line protocol on Interface TenGigE0/4/0/22, changed state to Up

Posted Jun 03, 2015 - 16:06 UTC