OVHcloud Network Status

Current status
Legend
  • Operational
  • Degraded performance
  • Partial Outage
  • Major Outage
  • Under maintenance
ams-1-6k
Incident Report for Network & Infrastructure
Resolved
Suite à la mise en place de \"wrr-queue\" sur toutes les interfaces 10G
du reseau, sur le routeur ams-1-6k, la carte 2 s'est mise en défaut.

Jul 29 08:52:14 GMT: %PM_SCP-SP-2-LCP_FW_ERR_INFORM: Module 2 is experiencing the following error: RO[2] (166004 noncritical int in the last 10s, they are now disabled). ROINTMSK[2]:
2E9=0xC,00F=0x728,024=0x1FFF,0E8=0x4,052=0x0,04C=0x1E,049=0x0,09D=0x2FFF,009=0x0,00C=0x0,

Le trafic passant par cette carte a été impacté. Nous avons coupé
le port et le trafic est revenu. Nous sommes en cours de redemarrage
de la carte.

Update(s):

Date: 2010-07-29 17:34:03 UTC
C'est fait.

Nous avons quand même dû desactiver MPLS. avec le MPLS
le routeur n'a pas assez de RAM et plantera. ça se joue
à 10Mo ... du coup nous avons desactivé le MPLS sur
ldn-1 aussi.

Le routeur est stable.

quelle journée ...

Date: 2010-07-29 16:19:10 UTC
On reload le routeur et on le remet en production.

Date: 2010-07-29 15:58:11 UTC
Nous allons changer la carte 10G puis redemarrer le routeur
et remettre le trafic. On verra si le routeur replante. Si
oui, ce qui est plus très probable, on va changer la carte
sup.

Date: 2010-07-29 09:53:00 UTC
Un probleme hardware est certainement à l'origine de ces problemes.
On va intervenir sur place pour changer le hardware. soit la carte
10G soit la sup, soit les 2. Il faut compter 3 heures de routes à
partir de Roubaix. Le trafic s'ecoule à travers london et frankfurt.

Date: 2010-07-29 09:43:28 UTC
Jul 29 11:40:34 40G.ams-1-6k.routers.ovh.net 920: Jul 29 10:40:10 GMT: %SYS-2-MALLOCFAIL: Memory allocation of 65536 bytes failed from 0x41044EEC, alignment 8
Jul 29 11:40:34 40G.ams-1-6k.routers.ovh.net 921: Pool: Processor Free: 1395584 Cause: Memory fragmentation
Jul 29 11:40:34 40G.ams-1-6k.routers.ovh.net 922: Alternate Pool: None Free: 0 Cause: No Alternate pool
Jul 29 11:40:34 40G.ams-1-6k.routers.ovh.net 923: -Process= \"IP RIB Update\", ipl= 0, pid= 164
Jul 29 11:40:34 40G.ams-1-6k.routers.ovh.net 924: -Traceback= 4102C83C 4103246C 41044EF4 413C2334 413C2578 4228B548 40641B40 42307BD0 409D3998 4098445C 4098457C
Jul 29 11:40:34 40G.ams-1-6k.routers.ovh.net 925: Jul 29 10:40:13 GMT: %FIB-3-NORPXDRQELEMS: Exhausted XDR queuing elements while preparing message for slot/cpu 6/0
Jul 29 11:40:34 40G.ams-1-6k.routers.ovh.net 926: -Process= \"IP RIB Update\", ipl= 0, pid= 164
Jul 29 11:40:34 40G.ams-1-6k.routers.ovh.net 927: -Traceback= 413C273C 4228B548 40641B40 42307BD0 409D3998 4098445C 4098457C
Jul 29 11:40:34 40G.ams-1-6k.routers.ovh.net 928: Jul 29 10:40:13 GMT: %FIB-3-UPDATEFAIL: Update of prefix 124.138.241.0/-256 failed, resulting in it being deleted.
Jul 29 11:40:48 40G.ams-1-6k.routers.ovh.net 929: Jul 29 10:40:17 GMT: %FIB-3-NOMEM: Malloc Failure, disabling DCEF


Date: 2010-07-29 09:42:08 UTC
ams-1-6k#sh mem stat
Head Total(b) Used(b) Free(b) Lowest(b) Largest(b)
Processor 44B219D0 927819312 715646936 212172376 0 1836520
I/O 8000000 67108864 12821888 54286976 54219792 54104760
ams-1-6k#reload

System configuration has been modified. Save? [yes/no]:
% Please answer 'yes' or 'no'.

System configuration has been modified. Save? [yes/no]:
% Please answer 'yes' or 'no'.

System configuration has been modified. Save? [yes/no]:
% Please answer 'yes' or 'no'.

System configuration has been modified. Save? [yes/no]: no
Proceed with reload? [confirm]
Connection closed by foreign host.


Le routeur a replanté. On a reussi à le reloader

Date: 2010-07-29 09:10:14 UTC
Le routeur revient. On va le remettre dans la backbone.

Date: 2010-07-29 08:27:28 UTC
On profite pour mettre à jour l'IOS vers une nouvelle version 17a.

Date: 2010-07-29 08:21:26 UTC
L'isolation du routeur a provoqué de coupure dans le service.

On a la main sur le routeur à nouveau. On save la conf et on
le redemarre.

Date: 2010-07-29 08:08:22 UTC
le routeur est planté

Jul 29 10:07:09 40G.ams-1-6k.routers.ovh.net 6774: Jul 29 09:06:51 GMT: %C6KFIB-4-DISABLED: Hardware FIB forwarding disabled, reverting to only software forwarding.
Jul 29 10:07:13 40G.ams-1-6k.routers.ovh.net 6775: Jul 29 09:06:53 GMT: %FIB-2-FIBDOWN: CEF has been disabled due to a low memory condition.
Jul 29 10:07:13 40G.ams-1-6k.routers.ovh.net 6776: It can be re-enabled by configuring \"ip cef [distributed]\"

on l'isole du reseau

Date: 2010-07-29 08:02:53 UTC
Jul 29 10:02:54 40G.ams-1-6k.routers.ovh.net 6703: Jul 29 09:02:34 GMT: %QM-2-TCAM_BAD_LOU: Bad TCAM LOU operation in ACL


Date: 2010-07-29 08:01:57 UTC
ams-1-6k#sh mem stat
Head Total(b) Used(b) Free(b) Lowest(b) Largest(b)
Processor 44B1D6B0 927836496 891342240 36494256 0 4132240
I/O 8000000 67108864 11948344 55160520 53479168 55056824


Date: 2010-07-29 08:01:09 UTC
Jul 29 10:00:31 40G.ams-1-6k.routers.ovh.net 6687: Jul 29 09:00:10 GMT: %SYS-3-CPUHOG: Task is running for (2000)msecs, more than (2000)msecs (33/3),process = CEF Reloader.
Jul 29 10:00:31 40G.ams-1-6k.routers.ovh.net 6688: -Traceback= 41D7B360 41042F5C 413C3E60 413C487C 413C4F48 41044C40 41044C2C
Jul 29 10:00:33 40G.ams-1-6k.routers.ovh.net 6689: Jul 29 09:00:12 GMT: %SYS-2-MALLOCFAIL: Memory allocation of 65536 bytes failed from 0x410433D8, alignment 8
Jul 29 10:00:33 40G.ams-1-6k.routers.ovh.net 6690: Pool: Processor Free: 7057952 Cause: Memory fragmentation
Jul 29 10:00:33 40G.ams-1-6k.routers.ovh.net 6691: Alternate Pool: None Free: 0 Cause: No Alternate pool
Jul 29 10:00:33 40G.ams-1-6k.routers.ovh.net 6692: -Process= \"CEF Reloader\", ipl= 0, pid= 146
Jul 29 10:00:33 40G.ams-1-6k.routers.ovh.net 6693: -Traceback= 4102AD28 41030958 410433E0 413C26A0 413C3E04 413C487C 413C4F48 41044C40 41044C2C
Posted Jul 29, 2010 - 07:59 UTC