OVHcloud Network Status

Current status
Legend
  • Operational
  • Degraded performance
  • Partial Outage
  • Major Outage
  • Under maintenance
192.95.32.0/24 et 192.95.33.0/24
Incident Report for Network & Infrastructure
Resolved
Le couple de n5 gérant ces 2 réseaux ont crashés.
%SYSMGR-2-HAP_FAILURE_SUP_RESET: System reset due to service \"eth_port_sec\" in vdc 1 has had a hap failure

Il viennent de finir le reboot à l'instant, les fex remontent

Nous investiguons

Update(s):

Date: 2014-10-15 11:52:18 UTC
Conf okay sur les port, un shut/no shut a résolu le probleme pour les derniers serveurs.

Tout est de nouveau opérationnel pour 192.95.32.0/24 et 192.95.33.0/24



Date: 2014-10-15 11:32:44 UTC
Tous les FEX sont de nouveau UP.

Il reste 8 serveurs partiellement non joignable, je regarde


Date: 2014-10-15 11:18:12 UTC
les Fex reviennent peu à peu

sh fex
FEX FEX FEX FEX Fex
Number Description State Model Serial
------------------------------------------------------------------------
100 FEX0100 Online N2K-C2248TP-E-1GE SSI16370ACF
101 FEX0101 Online N2K-C2248TP-E-1GE SSI16370ABZ
102 FEX0102 Connected N2K-C2248TP-1GE SSI1603063C
105 FEX0105 Connected N2K-C2248TP-E-1GE SSI16370AG7
109 FEX0109 Online N2K-C2248TP-E-1GE SSI16370EDR
111 FEX111 Online N2K-C2248TP-E-1GE SSI16370ED7
--- -------- Connected N2K-C2248TP-E-1GE SSI16370ECS
--- -------- Connected N2K-C2248TP-E-1GE SSI16370EDT
--- -------- Connected N2K-C2248TP-1GE SSI16080AR8
--- -------- Connected N2K-C2248TP-E-1GE SSI16370EDX
--- -------- Connected N2K-C2248TP-E-1GE SSI16370E7W


Date: 2014-10-15 11:01:05 UTC
La situation:
On est dans un état dégradé mais stable, seul les serveurs sur le fex 102 (baie T01C52) sont impactés.
les autre serveur sur les autre fex sont toujours joignable.


Nous avons eu un bug (en plus du bug port-security) lors de l'upgrade ISSU, qui a bloque le FEX 102 et la VPC

sh fex
FEX FEX FEX FEX Fex
Number Description State Model Serial
------------------------------------------------------------------------
100 FEX0100 Online N2K-C2248TP-E-1GE SSI16370ACF
101 FEX0101 Online N2K-C2248TP-E-1GE SSI16370ABZ
102 FEX0102 Check Upg Seq N2K-C2248TP-1GE SSI1603063C
103 FEX0103 Online N2K-C2248TP-1GE SSI16080AR8
104 FEX0104 Online N2K-C2248TP-E-1GE SSI16370E7W
105 FEX0105 Online N2K-C2248TP-E-1GE SSI16370AG7
106 FEX0106 Online N2K-C2248TP-E-1GE SSI16370ECS
107 FEX0107 Online N2K-C2248TP-E-1GE SSI16370EDT
108 FEX0108 Online N2K-C2248TP-E-1GE SSI16370EDX
109 FEX0109 Online N2K-C2248TP-E-1GE SSI16370EDR
111 FEX111 Online N2K-C2248TP-E-1GE SSI16370ED7

Eth1/31 vPC nodUpgrad trunk full 10G SFP-H10GB-C
Eth1/32 vPC nodUpgrad trunk full 10G SFP-H10GB-C

=> Ethernet1/32 is down (LC upgrade in progress)

actions en cours:
Nous allons mettre a jour le 2nd n5 puis reloader le 1er n5.

Les fex 100 et 101 vont basculer automatiquement sur le 2eme n5 car ils sont déjà à jour.
les fex103->111 vont reloader et cela rendra les serveur injoignable le temps du reboot du FEX.


Date: 2014-10-15 10:33:32 UTC
0% -- FAIL. Return code -1.

Remaining action::
\"Module(s) 103, 104, 105, 106, 107, 108, 109, 111 still need to be upgraded\".

Install has failed. Return code 0x40930020 (Non-disruptive upgrade of a module failed).
Please identify the cause of the failure, and try 'install all' again.


l'ISSU a planté sur le fex 102

et le 2nd n5 vient de crasher de nouveau (même erreur)

Date: 2014-10-15 09:56:14 UTC
2014 Oct 15 11:54:43 sw %$ VDC-1 %$ %SATCTRL-FEX108-2-SATCTRL_IMAGE: FEX108 Image update complete. Install pending
2014 Oct 15 11:54:56 sw %$ VDC-1 %$ %SATCTRL-FEX107-2-SATCTRL_IMAGE: FEX107 Image update complete. Install pending
2014 Oct 15 11:55:37 sw %$ VDC-1 %$ %SATCTRL-FEX109-2-SATCTRL_IMAGE: FEX109 Image update c[####################] 100% -- SUCCESS

Module 100: Non-disruptive upgrading.
[# ] 0%


Date: 2014-10-15 09:55:55 UTC
show install all status
There is an on-going installation...
Enter Ctrl-C to go back to the prompt.

Continuing with installation process, please wait.
The login will be disabled until the installation is completed.

Performing supervisor state verification.
SUCCESS

Supervisor non-disruptive upgrade successful.

Pre-loading modules.

les FEX sont en train de se mettre à jours

Date: 2014-10-15 09:43:04 UTC
Compatibility check is done:
Module bootable Impact Install-type Reason
------ -------- -------------- ------------ ------
1 yes non-disruptive reset
3 yes non-disruptive rolling
100 yes non-disruptive rolling
101 yes non-disruptive rolling
102 yes non-disruptive rolling
103 yes non-disruptive rolling
104 yes non-disruptive rolling
105 yes non-disruptive rolling
106 yes non-disruptive rolling
107 yes non-disruptive rolling
108 yes non-disruptive rolling
109 yes non-disruptive rolling
111 yes non-disruptive rolling



Images will be upgraded according to following table:
Module Image Running-Version New-Version Upg-Required
------ ---------------- ---------------------- ---------------------- ------------
1 system 6.0(2)N2(2) 6.0(2)N2(5) yes
1 kickstart 6.0(2)N2(2) 6.0(2)N2(5) yes
1 bios v3.6.0(05/09/2012) v3.6.0(05/09/2012) no
1 power-seq v1.0 v3.0 yes
1 SFP-uC v1.0.0.0 v1.0.0.0 no
3 power-seq v2.0 v2.0 no
100 fexth 6.0(2)N2(2) 6.0(2)N2(5) yes
101 fexth 6.0(2)N2(2) 6.0(2)N2(5) yes
102 fexth 6.0(2)N2(2) 6.0(2)N2(5) yes
103 fexth 6.0(2)N2(2) 6.0(2)N2(5) yes
104 fexth 6.0(2)N2(2) 6.0(2)N2(5) yes
105 fexth 6.0(2)N2(2) 6.0(2)N2(5) yes
106 fexth 6.0(2)N2(2) 6.0(2)N2(5) yes
107 fexth 6.0(2)N2(2) 6.0(2)N2(5) yes
108 fexth 6.0(2)N2(2) 6.0(2)N2(5) yes
109 fexth 6.0(2)N2(2) 6.0(2)N2(5) yes
111 fexth 6.0(2)N2(2) 6.0(2)N2(5) yes
1 microcontroller v1.2.0.1 v1.2.0.1 no


Date: 2014-10-15 09:22:55 UTC
dl des images sur les switch done, je commence la mise a jour

Date: 2014-10-15 08:46:52 UTC
note: la mise a jour se fait a chaud grâce a ISSU Cisco, il ne doit pas y avoir de coupure de service.

Date: 2014-10-15 08:39:24 UTC
Les switchs sont en version 6.0(2)N2(2)

Il est encore tôt en Amérique du nord, nous allons mettre a jour le couple en 6.0(2)N2(5) qui comporte des correction de bug sur port-security
Posted Oct 15, 2014 - 08:32 UTC
This incident affected: Infrastructure || BHS (BHS1, BHS2, BHS3, BHS4, BHS5, BHS6, BHS7).