Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I am trying to upload a file around ~5GB size as below but, it throws the error
string longer than 2147483647 bytes
. It sounds like there is a limit of 2 GB to upload. Is there a way to upload data in chunks? Can anyone provide guidance?
logger.debug(attachment_path)
currdir = os.path.abspath(os.getcwd())
os.chdir(os.path.dirname(attachment_path))
headers = self._headers
headers['Content-Type'] = content_type
headers['X-Override-File'] = 'true'
if not os.path.exists(attachment_path):
raise Exception, "File path was invalid, no file found at the path %s" % attachment_path
filesize = os.path.getsize(attachment_path)
fileToUpload = open(attachment_path, 'rb').read()
logger.info(filesize)
logger.debug(headers)
r = requests.put(self._baseurl + 'problems/' + problemID + "/" + attachment_type + "/" + urllib.quote(os.path.basename(attachment_path)),
headers=headers,data=fileToUpload,timeout=300)
ERROR:
string longer than 2147483647 bytes
UPDATE:
def read_in_chunks(file_object,chunk_size=30720*30720):
"""Lazy function (generator) to read a file piece by piece.
Default chunk size: 1k."""
while True:
data = file_object.read(chunk_size)
if not data:
break
yield data
f = open(attachment_path)
for piece in read_in_chunks(f):
r = requests.put(self._baseurl + 'problems/' + problemID + "/" + attachment_type + "/" + urllib.quote(os.path.basename(attachment_path)),
headers=headers,data=piece,timeout=300)
Your question has been asked on the requests bug tracker; their suggestion is to use streaming upload. If that doesn't work, you might see if a chunk-encoded request works.
[edit]
Example based on the original code:
# Using `with` here will handle closing the file implicitly
with open(attachment_path, 'rb') as file_to_upload:
r = requests.put(
"{base}problems/{pid}/{atype}/{path}".format(
base=self._baseurl,
# It's better to use consistent naming; search PEP-8 for standard Python conventions.
pid=problem_id,
atype=attachment_type,
path=urllib.quote(os.path.basename(attachment_path)),
headers=headers,
# Note that you're passing the file object, NOT the contents of the file:
data=file_to_upload,
# Hard to say whether this is a good idea with a large file upload
timeout=300,
I can't guarantee this would run as-is, since I can't realistically test it, but it should be close. The bug tracker comments I linked to also mention that sending multiple headers may cause issues, so if the headers you're specifying are actually necessary, this may not work.
Regarding chunk encoding: This should be your second choice. Your code was not specifying 'rb' as the mode for open(...), so changing that should probably make the code above work. If not, you could try this.
def read_in_chunks():
# If you're going to chunk anyway, doesn't it seem like smaller ones than this would be a good idea?
chunk_size = 30720 * 30720
# I don't know how correct this is; if it doesn't work as expected, you'll need to debug
with open(attachment_path, 'rb') as file_object:
while True:
data = file_object.read(chunk_size)
if not data:
break
yield data
# Same request as above, just using the function to chunk explicitly; see the `data` param
r = requests.put(
"{base}problems/{pid}/{atype}/{path}".format(
base=self._baseurl,
pid=problem_id,
atype=attachment_type,
path=urllib.quote(os.path.basename(attachment_path)),
headers=headers,
# Call the chunk function here and the request will be chunked as you specify
data=read_in_chunks(),
timeout=300,
–
–
–
–
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.