You signed in with another tab or window.
Reload
to refresh your session.
You signed out in another tab or window.
Reload
to refresh your session.
You switched accounts on another tab or window.
Reload
to refresh your session.
By clicking “Sign up for GitHub”, you agree to our
terms of service
and
privacy statement
. We’ll occasionally send you account related emails.
Already on GitHub?
Sign in
to your account
We have a micro service based spring boot architecture where we are using spring webclient (which internally uses reactor netty) for internal communication between services.
The issue that we faced on production was, we were getting random "connection reset by peer" exception in our services. No logs for the same request could be found in the called service.
This is how we were initialising our webclient earlier:
webClient = WebClient.builder().build();
To fix the same, we diabled connection pooling and initialised our webclient as below, post that the same exception was fixed.
webClient = WebClient.builder().clientConnector(new ReactorClientHttpConnector(HttpClient.newConnection())).build();
But how can we fix the same with connection pooling enabled as disabling connection pooling comes with its own disadvantages?
Reactor Netty version: 1.0.9
Spring boot version: 2.5.3
Exception:
2021-08-16 12:20:24,095 WARN [reactor-http-epoll-1] reactor.util.Loggers$Slf4JLogger: [id:04a24430-45, L:/10.0.8.88:33848 - R:172.20.0.20/172.20.0.20:3148]
The connection observed an error, the request cannot be retried as the headers/body were sent
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed:
Connection reset by peer
2021-08-16 12:20:24,100 ERROR [reactor-http-epoll-1] reactor.util.Loggers$Slf4JLogger: Operator called default onErrorDropped
reactor.core.Exceptions$ErrorCallbackNotImplemented: org.springframework.web.reactive.function.client.WebClientRequestException: readAddress(..) failed: Connection reset by peer; nested exception is io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer
Caused by: org.springframework.web.reactive.function.client.WebClientRequestException: readAddress(..) failed: Connection reset by peer; nested exception is io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer
at org.springframework.web.reactive.function.client.ExchangeFunctions$DefaultExchangeFunction.lambda$wrapException$9(ExchangeFunctions.java:141)
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
|_ checkpoint ⇢ Request to GET
http://172.20.0.20:3148/v1/users/referral/ec148ff3-5dd9-473f-a7f0-cb180a5e21f0
[DefaultWebClient]
Stack trace:
at org.springframework.web.reactive.function.client.ExchangeFunctions$DefaultExchangeFunction.lambda$wrapException$9(ExchangeFunctions.java:141)
at reactor.core.publisher.MonoErrorSupplied.subscribe(MonoErrorSupplied.java:55)
at reactor.core.publisher.Mono.subscribe(Mono.java:4338)
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:103)
at reactor.core.publisher.FluxPeek$PeekSubscriber.onError(FluxPeek.java:222)
at reactor.core.publisher.FluxPeek$PeekSubscriber.onError(FluxPeek.java:222)
at reactor.core.publisher.FluxPeek$PeekSubscriber.onError(FluxPeek.java:222)
at reactor.core.publisher.MonoNext$NextSubscriber.onError(MonoNext.java:93)
at reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain.onError(MonoFlatMapMany.java:204)
at reactor.core.publisher.SerializedSubscriber.onError(SerializedSubscriber.java:124)
at reactor.core.publisher.FluxRetryWhen$RetryWhenMainSubscriber.whenError(FluxRetryWhen.java:225)
at reactor.core.publisher.FluxRetryWhen$RetryWhenOtherSubscriber.onError(FluxRetryWhen.java:274)
at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.drain(FluxConcatMap.java:414)
at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.onNext(FluxConcatMap.java:251)
at reactor.core.publisher.EmitterProcessor.drain(EmitterProcessor.java:491)
at reactor.core.publisher.EmitterProcessor.tryEmitNext(EmitterProcessor.java:299)
at reactor.core.publisher.SinkManySerialized.tryEmitNext(SinkManySerialized.java:100)
at reactor.core.publisher.InternalManySink.emitNext(InternalManySink.java:27)
at reactor.core.publisher.FluxRetryWhen$RetryWhenMainSubscriber.onError(FluxRetryWhen.java:190)
at reactor.core.publisher.MonoCreate$DefaultMonoSink.error(MonoCreate.java:189)
at reactor.netty.http.client.HttpClientConnect$HttpObserver.onUncaughtException(HttpClientConnect.java:384)
at reactor.netty.ReactorNetty$CompositeConnectionObserver.onUncaughtException(ReactorNetty.java:647)
at reactor.netty.resources.DefaultPooledConnectionProvider$DisposableAcquire.onUncaughtException(DefaultPooledConnectionProvider.java:219)
at reactor.netty.resources.DefaultPooledConnectionProvider$PooledConnection.onUncaughtException(DefaultPooledConnectionProvider.java:467
@RitikaDangal
Please capture the traffic with Wireshark and share it. Is it possible that some network component (e.g. firewall etc.) closes the connection because of inactivity? If you configure
maxIdleTime
for the connection pool, do you see the issue? (
https://projectreactor.io/docs/netty/release/reference/index.html#connection-pool-timeout
)
@violetagg
I have tried capturing the traffic using Wireshark but did not see anything there. All was at network layer.
Will configure maxIdleTime and monitor for a day or two.
Thanks
Hi
@RitikaDangal
,
@violetagg
,
We were also facing a very similar issue with communication between springboot based microservices deployed in kubernetes.
Reactor Netty version: 1.0.10
Spring boot version: 2.5.4
we were also using
webClient = WebClient.builder().build();
, but we observed that once a request is complete, any subsequent request after about 20 mins was throwing the
connection reset by peer
issue with the same error as you have mentioned. However, the next request would go through as a new channel would get created then because of earlier disconnection. Most likely kubernetes was internally closing the connections on its end after 20 mins.
We tried setting the maxIdleTime with env varibales through
reactor.netty.pool.maxIdleTime: 600000
.
It ddint seem to be updating the maxIdleTime though. we use spring-boot-starter-webflux.
So we added custom connector to the webclient like below.
var provider = ConnectionProvider.builder("custom-name")
.maxConnections(500)
.pendingAcquireTimeout(Duration.ofSeconds(45))
.maxIdleTime(Duration.ofSeconds(600)).build();
HttpClient client = HttpClient.create(provider).compress(true);
WebClient.builder().clientConnector(new ReactorClientHttpConnector(client));
After this,
the connection reset by peer exceptions were fixed
. Any subsequent request after 10 mins of idle time would always cause the existing channel to disconnect and a new channel created.
@violetagg
We used the following connection provider and the issue is now resolved.
ConnectionProvider provider = ConnectionProvider.builder("fixed")
.maxConnections(500)
.maxIdleTime(Duration.ofSeconds(20))
.maxLifeTime(Duration.ofSeconds(60))
.pendingAcquireTimeout(Duration.ofSeconds(60))
.evictInBackground(Duration.ofSeconds(120)).build();
this.webClient = WebClient.builder()
.clientConnector(new ReactorClientHttpConnector(HttpClient.create(provider)))
.build();
Thanks
jumangee, satya-ashok, SoonhyukYoon, peiranAkelius, imbbc, PHINEXLISY, haythamdahri, yoseplee, jameschihtw, JamesBarrySeedLegals, and 23 more reacted with thumbs up emoji
violetagg, PHINEXLISY, haythamdahri, jameschihtw, pdibenedetto, mjj1409, sgc109, JPMolinaG, fjunqueira, TDtianzhenjiu, and 6 more reacted with hooray emoji
All reactions
after configuring the maxIdelTime it is work.
But why?
Does it mean the connection in the connection pool has been closed by a remote peer?
But it is still on the connection pool and still marked as available, once acquire that connection to read/write remote peer, will cause this exception?
Thanks @violetagg 🙏
in this case, however,
We can also retry on WebClientRequestException, it also can resolve this issue. am I right?
@TDtianzhenjiu you have to be careful with requests retry (for example if they are not idempotent https://www.rfc-editor.org/rfc/rfc9110.html#section-9.2.2)
@violetagg We used the following connection provider and the issue is now resolved.
ConnectionProvider provider = ConnectionProvider.builder("fixed")
.maxConnections(500)
.maxIdleTime(Duration.ofSeconds(20))
.maxLifeTime(Duration.ofSeconds(60))
.pendingAcquireTimeout(Duration.ofSeconds(60))
.evictInBackground(Duration.ofSeconds(120)).build();
this.webClient = WebClient.builder()
.clientConnector(new ReactorClientHttpConnector(HttpClient.create(provider)))
.build();
Thanks
solved to me