Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I am trying to upload a file around ~5GB size as below but, it throws the error string longer than 2147483647 bytes . It sounds like there is a limit of 2 GB to upload. Is there a way to upload data in chunks? Can anyone provide guidance?

logger.debug(attachment_path)
currdir = os.path.abspath(os.getcwd())
os.chdir(os.path.dirname(attachment_path))
headers = self._headers
headers['Content-Type'] = content_type
headers['X-Override-File'] = 'true'
if not os.path.exists(attachment_path):
    raise Exception, "File path was invalid, no file found at the path %s" % attachment_path
filesize = os.path.getsize(attachment_path) 
fileToUpload = open(attachment_path, 'rb').read()
logger.info(filesize)
logger.debug(headers)
r = requests.put(self._baseurl + 'problems/' + problemID + "/" + attachment_type + "/" + urllib.quote(os.path.basename(attachment_path)), 
                 headers=headers,data=fileToUpload,timeout=300)

ERROR:

string longer than 2147483647 bytes

UPDATE:

def read_in_chunks(file_object,chunk_size=30720*30720):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.read(chunk_size)
        if not data:
            break
        yield data
        f = open(attachment_path)
for piece in read_in_chunks(f):
      r = requests.put(self._baseurl + 'problems/' + problemID + "/" + attachment_type + "/" + urllib.quote(os.path.basename(attachment_path)), 
                        headers=headers,data=piece,timeout=300)

Your question has been asked on the requests bug tracker; their suggestion is to use streaming upload. If that doesn't work, you might see if a chunk-encoded request works.

[edit]

Example based on the original code:

# Using `with` here will handle closing the file implicitly
with open(attachment_path, 'rb') as file_to_upload:
    r = requests.put(
        "{base}problems/{pid}/{atype}/{path}".format(
            base=self._baseurl,
            # It's better to use consistent naming; search PEP-8 for standard Python conventions.
            pid=problem_id,
            atype=attachment_type,
            path=urllib.quote(os.path.basename(attachment_path)),
        headers=headers,
        # Note that you're passing the file object, NOT the contents of the file:
        data=file_to_upload,
        # Hard to say whether this is a good idea with a large file upload
        timeout=300,

I can't guarantee this would run as-is, since I can't realistically test it, but it should be close. The bug tracker comments I linked to also mention that sending multiple headers may cause issues, so if the headers you're specifying are actually necessary, this may not work.

Regarding chunk encoding: This should be your second choice. Your code was not specifying 'rb' as the mode for open(...), so changing that should probably make the code above work. If not, you could try this.

def read_in_chunks():
    # If you're going to chunk anyway, doesn't it seem like smaller ones than this would be a good idea?
    chunk_size = 30720 * 30720
    # I don't know how correct this is; if it doesn't work as expected, you'll need to debug
    with open(attachment_path, 'rb') as file_object:
        while True:
            data = file_object.read(chunk_size)
            if not data:
                break
            yield data
# Same request as above, just using the function to chunk explicitly; see the `data` param
r = requests.put(
    "{base}problems/{pid}/{atype}/{path}".format(
        base=self._baseurl,
        pid=problem_id,
        atype=attachment_type,
        path=urllib.quote(os.path.basename(attachment_path)),
    headers=headers,
    # Call the chunk function here and the request will be chunked as you specify
    data=read_in_chunks(),
    timeout=300,
                I took your suggestion and tried this but somehow it just hangs,I updated my code,what could be going wrong?
– carte blanche
                Nov 1, 2018 at 19:11
                It's not clear that you understand the suggested solution. You should not be iterating over chunks and creating separate requests for them; you should be passing your generator to a single request as the data parameter. I'll update with an example based on your original code and the links I provided.
– kungphu
                Nov 2, 2018 at 0:40
                I tried second option and it does upload but it throws error` ('Connection aborted.', error(32, 'Broken pipe'))`
– carte blanche
                Nov 5, 2018 at 4:15
                No, this question is about getting a 'string too large' error. That appears to have been solved. What you're talking about seems to be the response you're getting not being JSON; that's a separate issue, and it's not something I can address in comments. If you accept this answer and open a new question, I will gladly take a look, and I'm sure others will as well. But do leave this one here as a reference for others who might have the same issue; that's a big part of the value of StackOverflow.
– kungphu
                Nov 6, 2018 at 3:21
                @user2125827 your new error basically indicate something is wrong on the server side, the original client side error has well-addressed with this answer.
– georgexsh
                Nov 9, 2018 at 6:31
        

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.