Recently we had a problem with a buggy update to a piece of 3rd party client software. It produced lots and lots of valid, but nonsensical requests, targeting our system.
This post details how we added a dynamic rate limiting to our HAProxy load balancers, heavily throttling only a very specific set of HTTP requests caused by the client bug, while maintaining regular operations for other requests, even on the same URLs.
The files described in this article are available
in a GitHub repository
for easy access.
最近我有一个关于第三方客户端软件升级的问题。针对我们的系统,除了那些没有有意义的请求,也还产生许许多多有用(问题)。
这篇文章的细节是我怎样把一个动态的速率限制添加进我们的HAProxy负载平衡,大量地节流仅仅是一组由于客户端bug引起的非常特定的HTTP请求导致的,即使是同样的一些URL,或者是其他请求,(系统)保持着常规的操作。
在这篇文章中的描述文件可以在
GitHub仓库
上轻易访问。
最近我们遇到了一个因第三方客户端软件升级所引起的问题。这个软件对我们系统发起了许多许多有效的但却无意义的请求。
本文详细说明了我们如何在HAProxy负载均衡上添加一个动态速率限制(dynamic rate limiting),用它对客户端bug所产生的十分特定的HTTP请求做节流控制,同时也不影响其他请求的处理(即使这些请求的URL和客户端bug所产生的请求的URL是一样的)。
本文所提到的文件可在
github
上获得。
Stampede
What made things interesting was that the client software was mostly fine, but a single background sync feature repeatedly (and quite relentlessly) uploaded a tremendous amount of small objects, even though they had already been sent, creating lots of duplicate copies on the backend. At the same time, the interactive portion of the application was working nicely. Moreover, even though the problematic update was distributed to a wide audience, only a certain usage pattern would trigger the bug in a comparatively small portion of installs.
Due to the high frequency of requests coming in with almost no effort client-side, the more heavyweight asynchronous server side processing was not able to keep up, leading to a slowly, but continuously growing queue of outstanding requests.
While fair queueing made sure that most users did not notice much of a slowdown in their regular work with the system at first, it was clear that we needed a way to resolve this situation on our side until a fixed client update could be developed and rolled out.
蜂拥时的踩踏
让事情变得有趣的情况是什么呢,客户端软件工作正常,但一个后端的同步功能重复(很无情)上传了大量的小物件,即使它们已经在后端被发送,创建了很多副本。与此同时,应用的交互部分工作良好。此外,尽管有问题的更新已分发到了大多数用户,但只有特定的使用模式才会错误,影响引发一小部分的安装。
由于高频次的请求来自于几乎没有限制的客户端,更重的异步服务器端处理起来无法跟上,导致队列缓慢且未完成请求不断增长。
而正确的队列首先应该确保大多数用户在常规工作中意识不到他们所处系统变慢,很明显,我们需要在我们后端上寻找一种解决措施,使得指定的客户端的更新能够持续和推动。
Options
The most obvious solution would have been to revoke access for the affected OAuth Client ID, but it would also have been the one with the most drastic side-effects. Effectively, the application would have stopped working for all customers, including those who either did not yet have the broken update installed or whose behavior had not triggered the bug. Clearly not a good option.
Another course of action we considered for a short moment was to introduce a rate limit using the Client ID as a discriminator. It would have had the same broad side-effects as locking them out completely, affecting lots of innocent users. Basically anything just taking the Client ID into account would hit more users than necessary.
最显而易见的解决方案是取消受影响的OAuth客户端ID的访问,但它的副作用很大。或者呢,应用停止响应所有客户端,包括那些还没有进行异常更新安装的或者还没有触发bug的。显然这不是一个好的选择。
另一个我们考虑的办法是引入一个以客户端ID作为鉴别器的速率限制。它也同样有副作用,将用户彻底所在外面,影响很多无辜的用户。基本上任何只考虑客户端账号ID的方法都会影响比理应用户更多的用户。
Implemented Fix
What we came up with is a rate limiting configuration based on the user’s access token instead of the client software, and the specific API call the broken client flooded. While the approach itself is not particularly ingenious, the implementation of the corresponding HAProxy configuration turned out to be a little trickier than anticipated. Most examples are based on the sender’s IP address, however we did not want to punish all users behind the same NATing company firewall as one single offender.
So without further ado here is the relevant snippet from
haproxy.cfg
:
frontend fe_api_ssl
bind 192.168.0.1:443 ssl crt /etc/haproxy/ssl/api.pem no-sslv3 ciphers ...
default_backend be_api
tcp-request inspect-delay 5s
acl document_request path_beg -i /v2/documents
acl is_upload hdr_beg(Content-Type) -i multipart/form-data
acl too_many_uploads_by_user sc0_gpc0_rate() gt 100
acl mark_seen sc0_inc_gpc0 gt 0
stick-table type string size 100k store gpc0_rate(60s)
tcp-request content track-sc0 hdr(Authorization) if METH_POST document_request is_upload
use_backend 429_slow_down if too_many_uploads_by_user mark_seen
backend be_429_slow_down
timeout tarpit 2s
errorfile 500 /etc/haproxy/errorfiles/429.http
http-request tarpit
Let’s go through these in some more detail.
我们将要实施的是一个基于用户访问令牌的速率限制配置,而不是修正客户端软件或调用针对受损客户端的特定API。这种方法本身并不是特别灵巧,而且更改
实现
相应HAProxy的配置比预期要难。大多数的例子是基于发送者的IP地址,但是我们不想对处于同一NAT防火墙内的用户都做一致的惩罚。
事不宜迟,下面是haproxy.cfg的一个片段:
frontend fe_api_ssl
bind 192.168.0.1:443 ssl crt /etc/haproxy/ssl/api.pem no-sslv3 ciphers ...
default_backend be_api
tcp-request inspect-delay 5s
acl document_request path_beg -i /v2/documents
acl is_upload hdr_beg(Content-Type) -i multipart/form-data
acl too_many_uploads_by_user sc0_gpc0_rate() gt 100
acl mark_seen sc0_inc_gpc0 gt 0
stick-table type string size 100k store gpc0_rate(60s)
tcp-request content track-sc0 hdr(Authorization) if METH_POST document_request is_upload
use_backend 429_slow_down if too_many_uploads_by_user mark_seen
backend be_429_slow_down
timeout tarpit 2s
errorfile 500 /etc/haproxy/errorfiles/429.http
http-request tarpit
让我们一一来看一下细节。
First of all, right after declaring the frontend’s name to be
fe_api_ssl
we
bind
the appropriate IP address and port, and set up the TLS settings with the certificate/private key and a set of ciphers (left out for brevity).
Then we set the
default_backend
to be
be_api
. This will handle the default case of all requests that are not rate limited.
The next line
tcp-request inspect-delay
is required to ensure the following checks have all required information available. Leaving it out will even cause HAProxy to issue a warning, because we are using TCP related metrics a few lines further down. Setting the delay like this will make HAProxy wait
at most
5 seconds for the connection handshaking to complete until it starts evaluating the inspection rules. Not setting it would provoke race conditions, because the rules would be run immediately upon arrival of the first – potentially incomplete – data, leading to unpredictable results.
首先,紧接着前端模块的名称 fe_api_ssl ,我们绑定了需要的ip地址和端口,并且利用证书和私钥以及一组密码设置了 TLS 。
然后,我们设置了后端,将由模块 be_api 默认处理这些没有频率限制的请求。
下一行,需要tcp-request inspect-delay 来确认后续的检查所需的信息都是有效的。如果不填写,HAProxy可能会有一些告警,因为我们后续会使用到一些tcp相关的指标。 像这样设置这个delay的选项,会使得HAProxy 等待最多五秒钟来进行三次握手,直到它开始评估是否命中检查规则。若不设置,将引起一些冲突,因为规则将在第一个可能还未完成的数据包上直接应用,造成不可预计的后果。
The next block contains ACL rule definitions. It is important to say, that they are
not yet evaluated here
. The ACL names merely get bound to the rule following them.
document_request
checks if the requested resource’s
path_beg
ins with the string
/v2/documents/
, performing a case-insensitive comparison (
-i
).
is_upload
checks if the value of the
Content-Type
header matches the search string
multipart/form-data
, again case-insensitive. This is the Content-Type the broken client sends from its buggy code path. The other client features might access the same resource, but with different content types. We do not want to limit those.
too_many_uploads_by_user
is a little more involved. It checks, if the average increment rate of the General Purpose Counter (GPC) "0" is greater than 100 over the configured time period. We will get back to that in a moment.
mark_seen
defines that on its execution the General Purpose Register "0" should be incremented. This is the counter whose increase-rate is checked in
too_many_uploads_by_user
Next up we define a lookup table to keep track of string objects (
type string
) with up to 100.000 table rows. The content of that string will be the
Authorization
header value, i. e. the user’s access token (next line). The value stored alongside each token is the General Purpose Counter "0"’s increase rate over 1 minute.
再接下来的块代码则是包含了ACL规则的定义。这里需要重点说明一下,这些规则
并不是在这里执行的
。这里的ACL仅仅是为了设置的规则能生效以及被遵守。
-
is_upload则检测头部中的Content-Type的值是否匹配multipart/form-data,同样不区分大小写。这里的Content-Type即为有问题的代码路径被执行触发异常后返回给客户端的。另外一些客户端可能接入同样的资源,但却有着不同的Content-Type。这里就不一一罗列了。
-
too_many_uploads_by_user这个规则稍微有点绕。它主要用于检测0号通用计数器(GPC)的平均增长是否在配置指定的时间段内超出100。我们稍候会再次讨论这一点。
-
mark_seen定义了在HAProxy 的执行过程中0号通用计数器应该保持增长。这里所以说的0号通用计数器即为在上述too_many_uploads_by_user检测增长率的计数器。
再下一步,我们将定义一张保存着
字符串 - 对象
(type string)
这样映射关系的对照表,大约最大可以有100.000行。里面的字符串就是头部身份验证中的值,例如:用户接入的token(对应下一行配置)。每一个token旁边存储的值即为0号通用计数器每一分钟的增长率。
So much for the definition of rules. Now we will actually inspect an incoming request’s content (
tcp-request content
). We enable tracking of the session’s
Authorization
header’s value in the aforementioned stick-table under certain conditions. Those are listed after the
if
keyword (logical AND is the default). In this particular case we are only interested in tracking HTTP POST requests (
METH_POST
) that are
document_request
s (as defined before in the ACL of that name) and have the right Content-Type (
is_upload
ACL).
Notice, that so far the
too_many_uploads_by_user
and
mark_seen
ACLs have not yet been executed, because they were only declared so far.
关于ACL规则的定义暂时先讲这么多。接下来我们将检测请求进来的内容(tcp-request content)。一般情况下,我们在上述中的stick-table映射表中开启了对会话身份验证的追踪。这些值被列在关键字if(默认是逻辑与)的后面。例如在当前特定的场景下,我们仅仅关注对报文类型为document_request(正如在前面ACL定义中的名字)且Content-Type(即为:is_upload ACL)正确的HTTP POST(METH_POST)请求进行追踪。
值得注意的是,到目前为止too_many_uploads_by_user和mark_seen的ACL规则还没被执行,因为他们仅仅只是声明而已。
They are executed now as part of the
use_backend
directive. This will apply a different than the default backend in case the
too_many_requests_by_user
ACL matches. For this check to ever yield any menaingful result, we must ensure the GPC is actually incremented, so that the stick-table contains values other than 0 for each user access token.
This is where the
mark_seen
pseudo-ACL comes into play. Its only purpose is to increment the GPC for the tracking entry in the stick-table. It might seem a bit strange to do it like this, but remember, the ACL declaration did not actually do anything yet, but only connected names and actions/checks to be executed later.
In effect, whenever a request comes in that matches the conditions (POST method, correct path, correct Content-Type) a counter is incremented. If the rate of increase goes above 100 per minute, the request will be forwarded to the special
be_429_slow_down
backend.
too_many_uploads_by_user和mark_seen将会作为use_backend指令中的一部分而被执行。这里将会申请与默认后端完全不一样的后端以防命中too_many_requests_by_user ACL规则。对此为了检测全部有意义请求的所有流量,我们必须确保GPC确实是持续递增的,以便stick-table映射表为每一个用户接入的token都持有一个大于0的值。
这就是mark_seen伪ACL规则发挥作用的地方。它的唯一目的就是为追踪入口流量而增长映射表中的CPC。可能看起来这样做有点奇怪,但请记住,ACL规则定义实际上并不做任何的事情,却把配置的规则和后面将被执行的动作/检测关联起来。
事实上,无论何时只要一个进来的请求匹配满足了条件(POST方法,正确的请求路径,正确的报文类型Content-Type),计数器都会加1。如果增长率超过100次每分钟,再进来的请求就会转发到指定的be_429_slow_down后端处理。
If the requests come in slowly enough, they will be handled by the default backend.
The
be_429_slow_down
backend uses the so called tarpit feature, usually used to bind and attacker’s resources by keeping a request open for a defined period of time before closing it. The HTTP tarpit option sends an error to the client. Unfortunately, HAProxy does not allow the specification of a particular HTTP response code for tarpits, but always defaults to 500. As we want to both slow broken clients down as well as inform them about the particular error cause, we use a little hack: Using
errorfile
we specify a custom file
429.http
to be sent for 500 which in fact contains an HTTP 429 response. This goes against best practices, but works nicely nevertheless:
HTTP/1.1 429 Too Many Requests
Cache-Control: no-cache
Connection: close
Content-Type: text/plain
Retry-After: 60
Too Many Requests (HAP429).
See
the HAProxy documentation
for details.
如果进来的请求响应过慢,将会交由默认配置的后端处理。
be_429_slow_down 后端使用了所谓的tarpit特性 -- 通常用于在定义时期关掉请求前通过保持长连接来绑定相关的资源。在HTTP的tarpit选项中可以发送返回一个错误给客户端。然而,HAProxy不允许在tarpit选项中自定义的HTTP状态响应码,而是统一通常返回500。当我们需要既能缓解错误的请求,又需要提醒客户端对应具体的错误码时,我们使用了一点点黑客的技巧:通过errorfile文件,我们可以在返回500(但实际上是返回的状态码429)的情况下同时一个自定义的429.http文件。这虽然有违最佳实践,但可以很好地工作:
HTTP/1.1 429 Too Many Requests
Cache-Control: no-cache
Connection: close
Content-Type: text/plain
Retry-After: 60
Too Many Requests (HAP429).
更多信息,请查看:
HAProxy关于errorfile的文档
。
Conclusion
Most examples found online for rate limiting with HAProxy are based purely on ports and IP addresses, not on higher level protocol information. It took us a little while to put together all the pieces and wrap our heads around the concept of HAProxy’s counters, stick-tables and the time of ACL evaluations.
The config described above has been in production for a few weeks now and works flawlessly, keeping our backend servers safe from problematic clients. Should the need for other limits arise in the future, we now have an effective way to handle those in a fine-grained way.
网上大部分关于HAProxy频率限制的例子,都是根据ip和端口来限制,没有跟高级协议相关资讯。我们需要在脑海中花一点点时间来把所有与HAProxy的计数器、stick-table 还有访问规则这些零碎信息整合起来。
上述的配置已经应用在生产环境中,并且完美地把我们的后端服务器与有问题的客户端隔离起来。如果将来我们需要其他的限制,我们也可以采用这种更细粒度而且有效的方法来处理。