相关文章推荐
急躁的冰棍  ·  Pod | Kubernetes·  1 年前    · 

Java S3上传大文件(约1.5Tb)时出现ResetException错误。文件是通过InputStream读取/处理的

1 人关注

我有一个在Java中运行的应用程序。我有一个大文件,我将其加密并上传到S3。由于文件很大,我不能把它放在内存中,因此使用PipedInput和PipedOutputStreams来进行加密。 我用BufferedInputStream包装PipedInputStream,然后传递给S3 PutObjectRequest。我已经计算了加密对象的大小,并将其添加到Objectmetadata中。 下面是一些代码片段。

PipedInputStream pis = new PipedInputStream(uploadFileInfo.getPout(), MAX_BUFFER_SIZE);
BufferedInputStream bis = new BufferedInputStream(pis, MAX_BUFFER_SIZE);
LOG.info("Is mark supported? " + bis.markSupported());
PutObjectRequest putObjectRequest = new PutObjectRequest(uploadFileInfo.getS3TargetBucket(),
                        uploadFileInfo.getS3TargetObjectKey() + ".encrypted",
                        bis, metadata);
//Set read limit to more than stream size expected i.e 20mb
// https://github.com/aws/aws-sdk-java/issues/427
LOG.info("set read limit to " + (MAX_BUFFER_SIZE + 1));
putObjectRequest.getRequestClientOptions().setReadLimit(MAX_BUFFER_SIZE + 1);
Upload upload = transferManager.upload(putObjectRequest);

我的堆栈跟踪显示,reset() 调用BufferedInputStream时出现了异常。

[UPLOADER_TRACKER] ERROR com.xxx.yyy.zzz.handler.TrackProgressHandler - Exception from S3 transfer 
com.amazonaws.ResetException: The request to the service failed with a retryable reason, but resetting the request input stream has failed. See exception.getExtraInfo or debug-level logging for the original failure that caused this retry.;  If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1423)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1240)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5052)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4998)
    at com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3734)
    at com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3719)
    at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadPartsInSeries(UploadCallable.java:258)
    at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInParts(UploadCallable.java:189)
    at com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:121)
    at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:143)
    at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:48)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Resetting to invalid mark
    at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.services.s3.internal.InputSubstream.reset(InputSubstream.java:110)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.services.s3.internal.InputSubstream.reset(InputSubstream.java:110)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.services.s3.internal.MD5DigestCalculatingInputStream.reset(MD5DigestCalculatingInputStream.java:105)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:168)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1421)
    ... 22 more
[UPLOADER_TRACKER] ERROR com.xxx.yyy.zzz.handler.TrackProgressHandler - Reset exception caught ==> If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
com.amazonaws.ResetException: The request to the service failed with a retryable reason, but resetting the request input stream has failed. See exception.getExtraInfo or debug-level logging for the original failure that caused this retry.;  If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1423)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1240)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5052)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4998)
    at com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3734)
    at com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3719)
    at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadPartsInSeries(UploadCallable.java:258)
    at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInParts(UploadCallable.java:189)
    at com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:121)
    at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:143)
    at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:48)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Resetting to invalid mark
    at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.services.s3.internal.InputSubstream.reset(InputSubstream.java:110)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.services.s3.internal.InputSubstream.reset(InputSubstream.java:110)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.services.s3.internal.MD5DigestCalculatingInputStream.reset(MD5DigestCalculatingInputStream.java:105)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:168)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1421)

然而,我正在将readLimit添加到MAX_BUFFER_SIZE + 1。这是一个来自AWS的可靠性提示。 有没有人早些时候遇到过这个问题?题外话:因为我正在加密文件,所以我需要使用inputstream,而不是File或FileInputStream。我没有权限在本地写入磁盘,也没有。

6 个评论
你能公布你得到这个提示的链接吗?我希望是 1,而不是加1。
MAX_BUFFER_SIZE 的价值是什么? 你怎么知道它是足够的?
你是否在 metadata 中指定了文件的位置?否则,文件中说整个文件在写入前必须保留在内存中。 Content length for the data stream must be specified in the object metadata parameter; Amazon S3 requires it be passed in before the data is uploaded. Failure to specify a content length will cause the entire contents of the input stream to be buffered locally in memory so that the content length can be calculated, which can result in negative performance problems.
@JimGarrison 目前是40MB。我知道这不够,但我可能有多个文件要上传,所以不能把整个文件放在内存中。对不起,如果你的意思是另一种足够的东西,请解释。
是的,我已经指定了对象内容的长度,基于加密算法。最大20GB的文件被持续上传。
java
amazon-web-services
amazon-s3
aws-sdk
java-11
Pranay Sharma
Pranay Sharma
发布于 2021-04-28
2 个回答
Parsifal
Parsifal
发布于 2022-03-31
0 人赞同

我认为你误解了这项建议。引述 你提供的链接 ,并加上了强调。

例如,如果 一个数据流的最大预期大小 是100,000字节,那么将读取限制设置为100,001(100,000+1)字节。标记和重置将总是对100,000字节或更小的字节起作用。请注意,这可能会导致一些流在内存中缓冲该数量的字节。

根据我的理解,它将客户端配置为能够从源流中本地缓冲内容, 而该流本身不支持标记/复位。 这与 RequestClientOptions.DEFAULT_STREAM_BUFFER_SIZE 的文档一致

用于为不可标记和重置的非文件输入流启用标记和重置。

换句话说,它是用来在客户端缓冲 整个源流 的,而不是用来指定从源流发送多大的块。在你的例子中,我认为它被忽略了,因为(1)你没有缓冲整个流,(2)你传递的流本身 实现了标记/复位。

多部分上传,也就是你的例子中 TransferManager ,将输入流分成至少5MB的小块(实际的小块大小取决于流的声明大小;对于1.5TB的文件,大约是158MB)。这些是使用 UploadPart API调用上传的,它试图一次发送整个块。如果一个部分由于可重试的原因而失败,那么客户端就会尝试将流重置 到该块的起点。

你可以通过将你的 BufferedInputStream 上的读取限制设置为足以容纳单个部分的大小来使其发挥作用。传输管理器使用的计算方法在 这里 ;它是文件的大小除以10,000(多部分上传中的最大部分数量)。因此,同样是158 MiB。为了安全起见,我会使用200 MiB(因为我确信你有更大的文件)。

然而,如果是我,我可能会直接使用低级的多部分上传方法。在我看来, TransferManger 的主要好处是能够上传一个 文件, 它可以利用多个线程来执行并发的部分上传。在流中,你必须按顺序处理每个部分。

事实上,如果是我,我会认真地重新考虑上传一个1.5TB的文件。是的,你可以这样做。但我无法想象,每次你想读取文件时,你都要下载整个文件。相反,我希望你下载的是一个字节范围。在这种情况下,你可能会发现处理1500个大小为1吉比特的文件也很容易。

Sergiu Indrie
Sergiu Indrie
发布于 2022-03-31
0 人赞同

这似乎是S3 SDK的一个已知问题。 BufferedInputStream

见https://github.com/aws/aws-sdk-java/issues/427#issuecomment-273550783

最简单的解决方案(即使不是很理想)是将文件下载到本地,然后将 File 对象传递给S3 SDK,像这样

InputStream inputStream = ...;
File tempFile = File.createTempFile("upload-temp", "")