使用ab压测工具压测出go http client的time wait问题记录
原创背景
最近开发了一个微服务项目,一个处理流程里存在多个远程调用,为了找出服务的薄弱点,需要对每个远程调用都进行一下压测。
环境准备
我使用ab测试工具进行的压测,安装方式很简单.
(linux环境)
yum install httpd
压测命令
ab -c100 -n1000 http://xxx/xxx/get_tag_deviceid_md5?deviceid_md5=d9c1f0c52061443281d79c1407aa635d
ab -c200 -n1000 http://xxx/xxx/get_tag_deviceid_md5?deviceid_md5=d9c1f0c52061443281d79c1407aa635d
ab -c260 -n1000 http://xxx/xxx/get_tag_deviceid_md5?deviceid_md5=d9c1f0c52061443281d79c1407aa635d
ab -c270 -n1000 http://xxx/xxx/get_tag_deviceid_md5?deviceid_md5=d9c1f0c52061443281d79c1407aa635d
ab -c320 -n1000 http://xxx/xxx/get_tag_deviceid_md5?deviceid_md5=d9c1f0c52061443281d79c1407aa635d
之后我们就可以看到压测出来的qps和时延分布了。
ab -c240 -n1000 http://xxx:port/xxx/get_tag_deviceid_md5?deviceid_md5=d9c1f0c52061443281d79c1407aa635d
Benchmarking xxx.xxx.xxx.xx (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests
Server Software:
Server Hostname: xxx
Server Port: 61001
Document Path: /xxx/get_tag_deviceid_md5?deviceid_md5=d9c1f0c52061443281d79c1407aa635d
Document Length: 209 bytes
Concurrency Level: 240
Time taken for tests: 0.114 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 333000 bytes
HTML transferred: 209000 bytes
Requests per second: 8786.03 [#/sec] (mean)
Time per request: 27.316 [ms] (mean)
Time per request: 0.114 [ms] (mean, across all concurrent requests)
Transfer rate: 2857.18 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 1 1 0.9 1 4
Processing: 5 21 4.7 23 28
Waiting: 4 21 4.7 23 27
Total: 8 23 4.0 24 30
Percentage of the requests served within a certain time (ms)
50% 24
66% 24
75% 25
80% 25
90% 26
95% 27
98% 29
99% 30
100% 30 (longest request)
通过对比我发现我的服务在qps超过8786的时候出现了时延超过30ms的情况, 然后我用
netstat -n | grep -i 61001 | grep -i time_wait|wc -l
查看http链接time wait的情况发现存在24个,这个目前看是正常的。
于是我压测了一下我的服务, 服务里面是直接使用的如下方式调用的
resp, err := http.Get("http://xxx:port/xxx/get_tag_deviceid_md5?deviceid_md5=d9c1f0c52061443281d79c1407aa635d")
if err != nil {
// log here
} else {
// 解析
getHttpResponse(resp)
resp.Body.Close()
// log here
}
然后从我的服务进行压测
ab -n1000 -c270 'http://localhost:10086/xxx/api/v1/xxx_recommend/xxx_get?position_id=123&request_time=1619340375100&media_app_id=32050&platid=1&ad_type=10001&imei=891198043565563&oaid=123&oaid_md5=20RCB962AC59075B964B07152D234B70&android_id=123&android_id_md5=20RCB962AC59075B964B07152D234B70&channelid=10086&sign=c6755df84caa6f88d4bca662fb634b1d&request_id=77bb8c4197d5db206bcbdbb0b99e5d21'
这个情况下qps只有3100多,但是看一下timewait居然有400多,查了一下资料,发现go的http 池默认其实只会复用两个连接,其他的都会作为短连接关闭掉,这才是导致大量TIME WAIT的原因。
网上说的修复方式是:
var myClient *http.Client
func init(){
defaultRoundTripper := http.DefaultTransport
defaultTransportPointer, ok := defaultRoundTripper.(*http.Transport)
if !ok {
return
defaultTransport := *defaultTransportPointer // dereference it to get a copy of the struct that the pointer points to
defaultTransport.MaxIdleConns = 200
defaultTransport.MaxIdleConnsPerHost = 200
myClient = &http.Client{Transport: &defaultTransport}
}
// 调用处:
resp, err := myClient.Get("http://xxx:port/xxx/get_tag_deviceid_md5?deviceid_md5=d9c1f0c52061443281d79c1407aa635d")
if err != nil {
// log here