线上服务器出现多网关问题的处理

线上服务器172.18.30.67 有4个网口

其中eno1和eno2是两个光纤万兆口,enp26s0f0和enp26s0f1是两个以太千兆网口。

eno2网卡在没做bonding的时候,通过NetworkManager的dhcp获得了地址。然后eno1和eno2做了bonding,但是eno2实际还在单独起作用,bonding后地址没去掉,导致有两个网关。路由表如下:

1[root@renhe-18-30-67 ~]# ip r
2default via 172.18.29.254 dev eno2 
3default via 172.18.31.254 dev br0.199 proto static metric 426 
4172.18.28.0/23 dev eno2 proto kernel scope link src 172.18.28.67 
5172.18.30.0/23 dev br0.199 proto kernel scope link src 172.18.30.67 metric 426 
6192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1

eno2是从dhcp获得了172.18.28.67的地址

1ip a
2eno2: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 1500
3        inet 172.18.28.67  netmask 255.255.254.0  broadcast 172.18.29.255
4        ether b4:05:5d:08:e0:d8  txqueuelen 1000  (Ethernet)
5        RX packets 81768780  bytes 5398659503 (5.0 GiB)
6        RX errors 0  dropped 7  overruns 0  frame 0
7        TX packets 10044620  bytes 467566304 (445.9 MiB)
8        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

这种情况下,处理方法如下:

bonding的模式是 物理网卡 –> bond –> bond.xxx –>br.xxx

先查bonding模式:

1cat /sys/class/net/bond0/bonding/mode
2active-backup 1

再看bonding网卡状态:

 1cat /proc/net/bonding/bond0
 2Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
 3
 4Bonding Mode: fault-tolerance (active-backup)
 5Primary Slave: eno1 (primary_reselect always)
 6Currently Active Slave: eno1
 7MII Status: up
 8MII Polling Interval (ms): 100
 9Up Delay (ms): 0
10Down Delay (ms): 0
11
12Slave Interface: eno1
13MII Status: up
14Speed: 10000 Mbps
15Duplex: full
16Link Failure Count: 1
17Permanent HW addr: b4:05:5d:08:e0:d8
18Slave queue ID: 0
19
20Slave Interface: eno2
21MII Status: up
22Speed: 10000 Mbps
23Duplex: full
24Link Failure Count: 1
25Permanent HW addr: b4:05:5d:08:e0:d9
26Slave queue ID: 0

再次确认active

1cat /sys/class/net/bond0/bonding/active_slave
2eno1

再确认第二个default网关有效

1ping -I br0.199 172.18.31.254

从以上可以确定主网卡是eno1,shutdown了eno2不会影响任何东西。

接下来的步骤:

先查出来dhclient的进程号,是2252

1    1   2252   2252   2252 ?            -1 Ss       0   8:50 /sbin/dhclient -1 -q -lf /var/lib/dhclient/dhclient--eno2.lease -pf /var/run/dhclient-eno2.pid -H renhe-18-30-67 eno2

然后处理:

1systemctl stop NetworkManager
2systemctl disable NetworkManager
3# 杀掉dhclient
4kill -9 2252

这样eno2的ip地址在过一段时间后会消失掉。

如果不消失:

1ip link set eno2 down

然后等等

1ip link set eno2 up

就可以了。

由于是线上服务器,无法停机,所以操作才搞得这么小心谨慎。


114dns的ttl超时的教训
老版本cacti的图形filter突然失效的解决办法
comments powered by Disqus