首发于 1比特
wireshark检查网络延迟问题

wireshark检查网络延迟问题

wireshark可以说是网络问题排查的神器,里面的功能非常多,也很实用,本篇文章就是为一次课程试验,使用wireshark排查典型的网络场景。

一 试验环境搭建

采用的主机是版本都是centos 8.5版本,ip分别为192.168.31.50和192.168.31.200

[root@localhost ~]# cat /etc/centos-release
CentOS Linux release 8.5.2111
[root@localhost ~]# ip addr show ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:0c:29:cc:ff:99 brd ff:ff:ff:ff:ff:ff
    inet 192.168.31.50/24 brd 192.168.31.255 scope global noprefixroute ens33
       valid_lft forever preferred_lft forever
    inet6 fe80::31bf:afff:c603:4fa6/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
[root@MiWiFi-RA72-srv wrk-master]# ip addr show ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:0c:29:ed:16:d1 brd ff:ff:ff:ff:ff:ff
    inet 192.168.31.200/24 brd 192.168.31.255 scope global dynamic noprefixroute ens33
       valid_lft 39956sec preferred_lft 39956sec
    inet6 fe80::20c:29ff:feed:16d1/64 scope link 
       valid_lft forever preferred_lft forever
[root@MiWiFi-RA72-srv wrk-master]# 

试验环境采用docker搭建:

docker run --network=host --name=good -itd nginx
docker run --name nginx --network=host -itd feisky/nginx:latenc

验证下是否正常:

[root@localhost ~]#  curl http://192.168.31.50:8080
<h1>Welcome to nginx!</h1>
[root@localhost ~]#  curl http://192.168.31.50
<h1>Welcome to nginx!</h1>

均可正常运行,

1.1 奇怪的小问题

下面采用hping3 测试下两个nginx的性能:

[root@localhost ~]# hping3 -c 3 -S -p 80 192.168.31.50
HPING 192.168.31.50 (ens33 192.168.31.50): S set, 40 headers + 0 data bytes
--- 192.168.31.50 hping statistic ---
3 packets transmitted, 0 packets received, 100% packet loss
round-trip min/avg/max = 0.0/0.0/0.0 ms

很奇怪,hping3 进入无法正常请求,首先反应是不是防火墙:

[root@localhost ~]# systemctl status firewalld 
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:firewalld(1)

防火墙是关闭的,于是我抓包看看发生了什么:

[root@localhost ~]# tcpdump -i lo  port 80 -w good_nginx.pcap
dropped privs to tcpdump
tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
^C120 packets captured
240 packets received by filter
0 packets dropped by kernel
[root@localhost ~]# tshark -r good_nginx.pcap 
Running as user "root" and group "root". This could be dangerous.
    1   0.000000 192.168.31.50 → 192.168.31.50 TCP 54 1797 → 80 [SYN] Seq=0 Win=512 Len=0
    2   0.000013 192.168.31.50 → 192.168.31.50 TCP 58 80 → 1797 [SYN, ACK] Seq=0 Ack=1 Win=43690 Len=0 MSS=65495
    3   0.000016 192.168.31.50 → 192.168.31.50 TCP 54 1797 → 80 [RST] Seq=1 Win=0 Len=0
    4   1.000585 192.168.31.50 → 192.168.31.50 TCP 54 1798 → 80 [SYN] Seq=0 Win=512 Len=0
    5   1.000597 192.168.31.50 → 192.168.31.50 TCP 58 80 → 1798 [SYN, ACK] Seq=0 Ack=1 Win=43690 Len=0 MSS=65495
    6   1.000601 192.168.31.50 → 192.168.31.50 TCP 54 1798 → 80 [RST] Seq=1 Win=0 Len=0
    7   2.001198 192.168.31.50 → 192.168.31.50 TCP 54 1799 → 80 [SYN] Seq=0 Win=512 Len=0
    8   2.001209 192.168.31.50 → 192.168.31.50 TCP 58 80 → 1799 [SYN, ACK] Seq=0 Ack=1 Win=43690 Len=0 MSS=65495
    9   2.001213 192.168.31.50 → 192.168.31.50 TCP 54 1799 → 80 [RST] Seq=1 Win=0 Len=0

经过tshark的分析,SYN包后,主动发起了RST包,那看看检查下是不是半连接数量设置小了那:

#sysctl -a|grep tcp_max_syn_backlog 
net.ipv4.tcp_max_syn_backlog = 128

查看下监听端口:

[root@localhost ~]# ss -ln|grep 80
tcp   LISTEN 0      128                                                   0.0.0.0:8080             0.0.0.0:*           
tcp   LISTEN 0      128                                                   0.0.0.0:80               0.0.0.0:*           
tcp   LISTEN 0      128                                                      [::]:80                  [::]:*           
[root@localhost ~]# 

好吧,看起来只能通过netstat查看连接失败的tcp连接是什么情况:

#netstat -s
TcpExt:
    387 resets received for embryonic SYN_RECV sockets

这个数字再持续增加,说明是syn_recv状态下,收到RST被中断了(后面发现正常的包也是这样的,这就是hping3 -S 发送SYN包的机制,这就是上次试验的DoS攻击啊)。然后我突发奇想,因为是单机,所以我试了下127.0.0.1,结果竟然是通的,抓包的信息是一样的,换了一台机器也是可以hping3通过:

[root@localhost ~]# hping3 -I lo   -n -c 3 -S -p 80 192.168.31.50 
HPING 192.168.31.50 (lo 192.168.31.50): S set, 40 headers + 0 data bytes
len=44 ip=192.168.31.50 ttl=64 DF id=0 sport=80 flags=SA seq=0 win=43690 rtt=2.8 ms
len=44 ip=192.168.31.50 ttl=64 DF id=0 sport=80 flags=SA seq=1 win=43690 rtt=1.4 ms
len=44 ip=192.168.31.50 ttl=64 DF id=0 sport=80 flags=SA seq=2 win=43690 rtt=2.0 ms
--- 192.168.31.50 hping statistic ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 1.4/2.0/2.8 ms
[root@localhost ~]# hping3    -n -c 3 -S -p 80 127.0.0.1
HPING 127.0.0.1 (lo 127.0.0.1): S set, 40 headers + 0 data bytes
len=44 ip=127.0.0.1 ttl=64 DF id=0 sport=80 flags=SA seq=0 win=43690 rtt=5.8 ms
len=44 ip=127.0.0.1 ttl=64 DF id=0 sport=80 flags=SA seq=1 win=43690 rtt=3.4 ms
len=44 ip=127.0.0.1 ttl=64 DF id=0 sport=80 flags=SA seq=2 win=43690 rtt=2.0 ms
--- 127.0.0.1 hping statistic ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 2.0/3.7/5.8 ms
[root@localhost ~]# ^C
[root@localhost ~]# hping3    -n -c 3 -S -p 80 192.168.31.50
HPING 192.168.31.50 (ens33 192.168.31.50): S set, 40 headers + 0 data bytes
--- 192.168.31.50 hping statistic ---
3 packets transmitted, 0 packets received, 100% packet loss
round-trip min/avg/max = 0.0/0.0/0.0 ms

通过另外一台机器,完全正常,不明就里,知道者麻烦告知,谢谢:

[root@MiWiFi-RA72-srv ~]# hping3  -c 3 -S -p 80 192.168.31.50 
HPING 192.168.31.50 (ens33 192.168.31.50): S set, 40 headers + 0 data bytes
len=46 ip=192.168.31.50 ttl=64 DF id=0 sport=80 flags=SA seq=0 win=29200 rtt=2.0 ms
len=46 ip=192.168.31.50 ttl=64 DF id=0 sport=80 flags=SA seq=1 win=29200 rtt=1.4 ms
len=46 ip=192.168.31.50 ttl=64 DF id=0 sport=80 flags=SA seq=2 win=29200 rtt=2.0 ms
--- 192.168.31.50 hping statistic ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 1.4/1.8/2.0 ms

二 开始试验

2.1 hping3 延迟测试

用hping3 测试延迟,这个两个都差不多,说明-c指定发送的包个数,-S发送SYN报文,-p指定服务器端的端口,下面跟着的是IP。

[root@MiWiFi-RA72-srv ~]# hping3  -c 3 -S -p 80 192.168.31.50 
HPING 192.168.31.50 (ens33 192.168.31.50): S set, 40 headers + 0 data bytes
len=46 ip=192.168.31.50 ttl=64 DF id=0 sport=80 flags=SA seq=0 win=29200 rtt=1.8 ms
len=46 ip=192.168.31.50 ttl=64 DF id=0 sport=80 flags=SA seq=1 win=29200 rtt=1.3 ms
len=46 ip=192.168.31.50 ttl=64 DF id=0 sport=80 flags=SA seq=2 win=29200 rtt=2.7 ms
--- 192.168.31.50 hping statistic ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 1.3/1.9/2.7 ms
[root@MiWiFi-RA72-srv ~]# hping3  -c 3 -S -p 8080 192.168.31.50 
HPING 192.168.31.50 (ens33 192.168.31.50): S set, 40 headers + 0 data bytes
len=46 ip=192.168.31.50 ttl=64 DF id=0 sport=8080 flags=SA seq=0 win=29200 rtt=0.6 ms
len=46 ip=192.168.31.50 ttl=64 DF id=0 sport=8080 flags=SA seq=1 win=29200 rtt=1.2 ms
len=46 ip=192.168.31.50 ttl=64 DF id=0 sport=8080 flags=SA seq=2 win=29200 rtt=2.0 ms
--- 192.168.31.50 hping statistic ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.6/1.2/2.0 ms

看了下,两个nginx的端口延迟差不多,都在2ms左右。

2.2 wrk测试应用性能

wrk的测试选项如下:

#  --latency     打印延迟统计
# -c 100 保持100个连接
# -t 4 个线程
#--timeout 2     socket请求超时时间  
 [root@MiWiFi-RA72-srv wrk-master]# wrk --latency -c 100 -t 4 --timeout 2 http://192.168.31.50/

测试80端口和8080端口:

[root@MiWiFi-RA72-srv wrk-master]# wrk --latency -c 100 -t 4 --timeout 2 http://192.168.31.50:8080/
Running 10s test @ http://192.168.31.50:8080/
  4 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    41.64ms    6.15ms  83.39ms   96.82%
    Req/Sec   601.11     81.26     1.05k    80.00%
  Latency Distribution
     50%   42.08ms
     75%   42.69ms
     90%   43.51ms
     99%   48.09ms
  24007 requests in 10.04s, 19.48MB read
Requests/sec:   2391.46
Transfer/sec:      1.94MB
[root@MiWiFi-RA72-srv wrk-master]# 
[root@MiWiFi-RA72-srv wrk-master]# 
[root@MiWiFi-RA72-srv wrk-master]# 
[root@MiWiFi-RA72-srv wrk-master]# wrk --latency -c 100 -t 4 --timeout 2 http://192.168.31.50
Running 10s test @ http://192.168.31.50
  4 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     5.29ms    7.57ms 105.21ms   95.68%
    Req/Sec     5.81k     2.07k   10.87k    63.66%
  Latency Distribution
     50%    3.36ms
     75%    5.91ms
     90%    9.66ms
     99%   39.66ms
  232036 requests in 10.09s, 188.76MB read
Requests/sec:  23005.97
Transfer/sec:     18.71MB

两个端口的差距还是很大的,一个Request/sec为2391个,且90%的延迟在43ms,而第二个Request/sec为2万3千多,90%的延迟在9ms左右,下面就要排查下原因。

2.3 原因排查

同样抓包,然后用wireshark打开,我们用两种方式看: 通过流量图查看 第一种方式,通过过滤流,然后查看”统计“里面的”流量图“。

  1. 我们先在wireshark中的列表中,右键,选择”追踪流“ 查看TCP流视图 , 然后关闭弹出窗口,这个好处是再过滤框中,自动显示过滤刚才选中的流。


  1. 查看流量图

注意勾选 限制显示过滤器,选择TCP Flow和任何地址。 通过前面的时间来看,连续性没多大问题,每个包之间时间差比较小。

再看延迟大的情况:


虽然这种方式也可以看到有的延迟增大,但是还不够直观,可以采用第二种方式。

往返时间统计图

  1. 通过”统计“菜单进入到"TCP流图形” 选择“往返时间”


可以很清楚的看到,延迟大的往返时间很多延迟40ms以上,而正常的延迟,大部分点都在10ms之内。

延迟确认 再通过第一种方法,可以看到,延迟大的都是确认包,而这个40ms,是TCP延迟确认的最小超时时间。

延迟确认: TCP对确认的一种优化,因为如果单独发确认包,信息携带的比较少,所以不是每次收到请求立刻回复确认包,而是延迟等一会(40ms这里),然后看看是否有包需要发送,有需要发送包的情况下,直接将ACK带过去,如果没有需要发送的数据,再单独发送确认包,所以就有40ms的延迟了。

查看TCP帮助,TCP_QUICKACK ,只有TCP套接字设置了这个选项才会开启快速确认。

     TCP_QUICKACK (since Linux 2.4.4)
              Enable  quickack  mode  if  set  or  disable quickack mode if cleared.  In quickack mode, acks are sent immediately, rather than delayed if
              needed in accordance to normal TCP operation.  This flag is not permanent, it only enables a switch to or from quickack  mode.   Subsequent
              operation  of  the  TCP  protocol  will  once again enter/leave quickack mode depending on internal protocol processing and factors such as
              delayed ack timeouts occurring and data transfer.  This option should not be used in code intended to be portable.

这个时间和系统时钟频率有关系,它是一个宏定义:

#define TCP_DELACK_MAX  ((unsigned)(HZ/5))
#define TCP_DELACK_MIN   ((unsigned)(HZ/25))

看下本机时钟频率:

[root@MiWiFi-RA72-srv wrk-master]# cat /boot/config-4.18.0-305.3.1.el8.x86_64 |grep 'CONFIG_HZ='
CONFIG_HZ=1000

那么,最大时间就是1000/5 =200(ms),最小延迟时间1000/25= 40(ms),关闭延迟确认:

setsockopt(sock_fd,IPPROTO_TCP,TCP_QUICKACK,(char*)&value,sizeof(int));

那么来确认下wrk是否开启了这个选项:

[root@MiWiFi-RA72-srv wrk-master]# strace -e trace=network -f  wrk --latency -c 100 -t 4 --timeout 2 http://192.168.31.50
strace: Process 21316 attached
strace: Process 21318 attached
strace: Process 21319 attached
Running 10s test @ http://192.168.31.50
  4 threads and 100 connections
strace: Process 21317 attached
[pid 21317] socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 8