OVHcloud Private Cloud Status

Current status
Legend
  • Operational
  • Degraded performance
  • Partial Outage
  • Major Outage
  • Under maintenance
rbx-s50-6k
Incident Report for Hosted Private Cloud
Resolved
rbx-s50-6k, l'un des routeurs de tête du réseau pcc de Roubaix a crashé. Nous le redémarrons.

Nov 22 04:47:14 CET: %DIAG-SP-3-TEST_FAIL: Module 7: TestSPRPInbandPing{ID=2} has failed. Error code = 0xC3 (DIAG_CHECK_RP_PAK_ERROR)
Nov 22 04:47:14 CET: %DIAG-SP-3-MINOR: Module 7: Online Diagnostics detected a Minor Error. Please use 'show diagnostic result ' to see test results.
Nov 22 04:47:36 CET: %DIAG-SP-3-TEST_FAIL: Module 7: TestSPRPInbandPing{ID=2} has failed. Error code = 0xC3 (DIAG_CHECK_RP_PAK_ERROR)
Nov 22 04:48:00 CET: %DIAG-SP-3-TEST_FAIL: Module 7: TestSPRPInbandPing{ID=2} has failed. Error code = 0xC3 (DIAG_CHECK_RP_PAK_ERROR)
Nov 22 04:48:22 CET: %DIAG-SP-3-TEST_FAIL: Module 7: TestSPRPInbandPing{ID=2} has failed. Error code = 0xC3 (DIAG_CHECK_RP_PAK_ERROR)
Nov 22 04:48:45 CET: %DIAG-SP-3-TEST_FAIL: Module 7: TestSPRPInbandPing{ID=2} has failed. Error code = 0xC3 (DIAG_CHECK_RP_PAK_ERROR)
Nov 22 04:48:45 CET: %CONST_DIAG-SP-3-HM_TEST_FAIL: Module 7 TestSPRPInbandPing consecutive failure count:5
Nov 22 04:48:45 CET: %CONST_DIAG-SP-6-HM_TEST_INFO: CPU util(5sec): SP=56% RP=71% Traffic=0%
netint_thr_active[0], Tx_Rate[600], Rx_Rate[147], dev=3[IPv4, fail=5], 4[IPv4, fail=5]
Nov 22 04:49:12 CET: %DIAG-SP-3-TEST_FAIL: Module 7: TestSPRPInbandPing{ID=2} has failed. Error code = 0xC3 (DIAG_CHECK_RP_PAK_ERROR)
Nov 22 04:49:36 CET: %DIAG-SP-3-TEST_FAIL: Module 7: TestSPRPInbandPing{ID=2} has failed. Error code = 0xC3 (DIAG_CHECK_RP_PAK_ERROR)
Nov 22 04:50:13 CET: %DIAG-SP-3-TEST_FAIL: Module 7: TestSPRPInbandPing{ID=2} has failed. Error code = 0xC3 (DIAG_CHECK_RP_PAK_ERROR)
Nov 22 04:50:35 CET: %DIAG-SP-3-TEST_FAIL: Module 7: TestSPRPInbandPing{ID=2} has failed. Error code = 0xC3 (DIAG_CHECK_RP_PAK_ERROR)
Nov 22 04:50:56 CET: %DIAG-SP-3-TEST_FAIL: Module 7: TestSPRPInbandPing{ID=2} has failed. Error code = 0xC3 (DIAG_CHECK_RP_PAK_ERROR)
Nov 22 04:50:56 CET: %CONST_DIAG-SP-3-HM_TEST_FAIL: Module 7 TestSPRPInbandPing consecutive failure count:10
Nov 22 04:50:56 CET: %CONST_DIAG-SP-6-HM_TEST_INFO: CPU util(5sec): SP=41% RP=84% Traffic=0%
netint_thr_active[0], Tx_Rate[600], Rx_Rate[146], dev=3[IPv4, fail=10], 4[IPv4, fail=10]
Nov 22 04:51:18 CET: %DIAG-SP-3-TEST_FAIL: Module 7: TestSPRPInbandPing{ID=2} has failed. Error code = 0xC3 (DIAG_CHECK_RP_PAK_ERROR)
Nov 22 04:51:38 CET: %HA_EM-6-LOG: Mandatory.go_sprping.tcl: 1Process Forced Exit- MAXRUN timer expired.
Nov 22 04:51:38 CET: %HA_EM-6-LOG: Mandatory.go_sprping.tcl: while executing
Nov 22 04:51:38 CET: %HA_EM-6-LOG: Mandatory.go_sprping.tcl: \"if [catch {cli_exec $cli1(fd) \"diagnostic action mod $card test TestSPRPInbandPing default\"} result] {
Nov 22 04:51:38 CET: %HA_EM-6-LOG: Mandatory.go_sprping.tcl: error $result $errorInfo
Nov 22 04:51:38 CET: %HA_EM-6-LOG: Mandatory.go_sprping.tcl: } else {
Nov 22 04:51:38 CET: %HA_EM-6-LOG: Mandatory.go_sprping.tcl: set c...\"
Nov 22 04:51:38 CET: %HA_EM-6-LOG: Mandatory.go_sprping.tcl: (file \"tmpsys:/eem_policy/Mandatory.go_sprping.tcl\" line 78)
Nov 22 04:51:38 CET: %HA_EM-6-LOG: Mandatory.go_sprping.tcl: Tcl policy execute failed: 1Process Forced Exit- MAXRUN timer expired.
Queued messages:
Nov 22 04:51:59 CET: %SYS-3-LOGGER_FLUSHING: System pausing to ensure console debugging output.

Nov 22 04:51:59 CET: %C6K_PLATFORM-2-PEER_RESET: RP is being reset by the SP
Nov 22 04:52:06 CET: %SYS-SP-3-LOGGER_FLUSHING: System pausing to ensure console debugging output.

Nov 22 04:52:05 CET: %SYS-SP-3-CPUHOG: Task is running for (4428)msecs, more than (2000)msecs (90/90),process = Crash writer.
-Traceback= 2
Nov 22 04:52:05 CET: %SYS-SP-3-CPUHOG: Task is running for (4432)msecs, more than (2000)msecs (90/90),process = Crash writer.
-Traceback= 419B5A30 114C 7D0 5A 5A 41D2C1C0 443A9780
Nov 22 04:52:06 CET: %OIR-SP-6-CONSOLE: Changing console ownership to switch processor



No warm reboot Storage
*** System received an unknown failure ***
signal= 0x0, code= 0x0, context= 0x443ab2f4
PC = 0x417bf3f0, Cause = 0x1020, Status Reg = 0x34008102
Exit at the end of BOOT string


Update(s):

Date: 2013-11-22 06:02:45 UTC
Nous ne décelons aucune anomalie suite au redémarrage du routeur.

Date: 2013-11-22 05:40:25 UTC
Le routeur est up dans le fonctionnement normal. Nous effectuons les vérifications afin de nous assurer qu'il n'y a pas eu d'effet de bord pour la prod.

Date: 2013-11-22 05:20:02 UTC
Nous rencontrons des difficultés pour booter le routeur avec la configuration correcte de la TCAM. Nous redémarrons de nouveau le chassis.

Date: 2013-11-22 04:55:16 UTC
Le routeur est en cours de redémarrage avec la carte CF sur laquelle nous avons redescendu l'image IOS et le backup de la config.

Date: 2013-11-22 04:48:38 UTC
Nous avons un problème avec la carte CF du superviseur. Cette carte stocke l'image IOS et la config. Nous préparons une nouvelle carte CF.

Date: 2013-11-22 04:18:23 UTC
Le routeur est en cours de redémarrage. Nous nous préparons à remplacer la carte 7 qui a provoqué le crash, si nécessaire.
Posted Nov 22, 2013 - 04:16 UTC