L'utilisation des buffers est elevée sur ce switch (12b ne présente ps de signe anormal)
Cela est provoque par le process AFM, jamais très bon signe... (ACL Feature Manager )
rbx6-12b-n56# sh system internal mts buffers summary
node sapno recv_q pers_q npers_q log_q
sup 175 0 9 0 0
sup 377 0 0 0 47
sup 608 0 159 0 0
sup 284 0 4 0 0
sup 351 0 0 0 17
rbx6-12b-n56# sh system internal mts sup sap 608 description
Afm SAP
On investigue sur la root cause mais ça sent le reload.
Update(s):
Date: 2016-04-30 11:41:03 UTC
transceiver changé, fex116 up, buffer et sw okay, nous pouvons retourner a une activité normale.
Date: 2016-04-30 11:27:23 UTC
les fexs ont été uppé, la redondance est rétablie tous les fex sauf le 116
En effet, le 116 a flappé cote 12B, nous avons un optiques hs dessus => en cours de fix par le datacentre.
Date: 2016-04-30 10:48:27 UTC
rbx6-12b-n56# sh fex
FEX FEX FEX FEX Fex
Number Description State Model Serial
------------------------------------------------------------------------
100 fex100 Online N2K-C2248TP-E-1GE SSI181709KY
101 fex101 Online N2K-C2248TP-E-1GE FOX1844G5AX
102 fex102 Online N2K-C2248TP-E-1GE FOX1901G31F
103 fex103 Online N2K-C2248TP-E-1GE FOX1901G2YS
104 fex104 Online N2K-C2248TP-E-1GE FOX1844G75X
105 fex105 Online N2K-C2248TP-E-1GE FOX1905GDWS
106 fex106 Online N2K-C2248TP-E-1GE FOX1844GJHP
Date: 2016-04-30 10:31:58 UTC
reload du SW done.
Nous avons shutter les po vers les FEX pour eviter de saturer de nouveau en uppant les 1000eth d'un coup.
Nous remontons les fex 1 par 1 en surveillant les buffers
rbx6-12b-n56# sh fex
FEX FEX FEX FEX Fex
Number Description State Model Serial
------------------------------------------------------------------------
100 fex100 Online N2K-C2248TP-E-1GE SSI181709KY
101 fex101 Online N2K-C2248TP-E-1GE FOX1844G5AX
102 fex102 Online N2K-C2248TP-E-1GE FOX1901G31F
Date: 2016-04-30 10:16:42 UTC
CPU avant le reload, snmpd tabasse le switch: cela semble être une conséquence.
wild guess a confirmer avec Cisco: ETHPM galere => provoque la monter en buffer d'AFM => SNMP galère.
Le tout prend tout le CPU et on entre dans un cercle...
rbx6-12b-n56# sh system internal processes cpu
top - 12:10:39 up 315 days, 19:11, 3 users, load average: 1.28, 1.45, 1.15
Tasks: 240 total, 3 running, 236 sleeping, 0 stopped, 1 zombie
Cpu(s): 2.9%us, 1.7%sy, 0.0%ni, 95.0%id, 0.0%wa, 0.0%hi, 0.5%si, 0.0%st
Mem: 8243352k total, 3861200k used, 4382152k free, 288k buffers
Swap: 0k total, 0k used, 0k free, 1463832k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28326 root 20 0 348m 38m 25m R 52.8 0.5 30903:01 snmpd
4458 root 20 0 321m 70m 19m R 33.9 0.9 19053:50 ethpm
3994 root 20 0 310m 41m 15m S 17.0 0.5 15171:01 stats_client
8423 nicolas. 20 0 3620 1528 1140 R 7.5 0.0 0:00.07 top
4050 root 20 0 321m 32m 20m S 3.8 0.4 5352:08 pm
4174 root 20 0 442m 73m 26m S 3.8 0.9 6659:44 netstack
4170 root 20 0 297m 49m 20m S 1.9 0.6 1567:58 satmgr
1 root 20 0 2004 664 580 S 0.0 0.0 5:19.84 init
2 root 15 -5 0 0 0 S 0.0 0.0 0:00.01 kthreadd
3 root RT -5 0 0 0 S 0.0 0.0 0:11.29 migration/0
4 root 15 -5 0 0 0 S 0.0 0.0 94:25.26 ksoftirqd/0
5 root RT -5 0 0 0 S 0.0 0.0 5:09.96 watchdog/0
6 root RT -5 0 0 0 S 0.0 0.0 0:14.36 migration/1
Date: 2016-04-30 10:02:24 UTC
o spanning tree instance exists.
rbx6-12b-n56# sh platform afm info copp-tbls | diff
8,10c8,10
< 0 default 64000 6250 51700252190 4151275828
< 1 stp 2500000 4687 1214117872 0
< 2 lacp 128000 4687 574984688 0
---
> 0 default 64000 6250 51700312959 4151275828
> 1 stp 2500000 4687 1214119104 0
> 2 lacp 128000 4687 574985296 0
15c15
< 7 sat control 62500000 65535 2318965670683 0
---
> 7 sat control 62500000 65535 2318968001023 0
25c25
< 18 cdp 128000 4687 159709968 0
---
> 18 cdp 128000 4687 159710144 0
28,29c28,29
< 21 mgmt/ipv6-mgmt* 1500000 4687 139677728087 5781405
< 23 arp/ipv6-nd 8000 3515 16452102544 630004096
---
> 21 mgmt/ipv6-mgmt* 1500000 4687 139677925157 5781405
> 23 arp/ipv6-nd 8000 3515 16452118836 630004096
33c33
< 27 hsrp vrrp/ipv6-hsrp 128000 250 2987080360 85648746
---
> 27 hsrp vrrp/ipv6-hsrp 128000 250 2987083756 85648746
44c44
< 41 excp/ipv6-excp** 8000 4687 5679291770 384144830
---
> 41 excp/ipv6-excp** 8000 4687 5679301982 384144830
Nous prennons qq logs et reloadons la box, pas de downtime, le trafic est forwardé par 12a
Posted Apr 30, 2016 - 09:53 UTC