Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I am trying to write and read to a stream without loading everything into memory at once. Here's what I would imagine working:
import io
stream = io.BytesIO()
def process_stream(stream):
while True:
chunk = stream.read(5).decode('utf-8')
if not chunk:
return
yield chunk
# this would be a separate thread, but here we just do it in serial:
for i in range(3):
stream.write(b'asdf')
for chunk in process_stream(stream):
print('I read', chunk)
But this actually doesn't print out anything.
I can get it working, but only with the following two changes, either of which requires that all the bytes are held in memory at once:
initializing stream = io.BytesIO(b'asdf' * 3)
instead of incrementally writing
using stream.getvalue()
instead of incrementally reading
I'm quite baffled that incremental writing can only be read by batch reading, and that incremental reading only works for batch writing. How can a get a constant-memory (assuming process_stream
outpaces writing) solution working?
When you write to the stream using for loop. Your seek ends up in the last position.
asdfasdfasdf|
^ (Seek)
So when you try to read, well there is nothing after the last character, therefore you get nothing when reading the stream. A solution is to reposition the seek to the beginning of the stream so you can read it. For that we can use stream.seek(0)
|asdfasdfasdf
^ (Seek after calling stream.seek(0))
Code:
import io
stream = io.BytesIO()
def process_stream(stream, chunk_size=5):
while True:
chunk = stream.read(chunk_size).decode('utf-8')
if not chunk:
return
yield chunk
# this would be a separate thread, but here we just do it in serial:
for i in range(3):
stream.write(b'asdf')
stream.seek(0) # Reset the seek so it is at the beginning
for chunk in process_stream(stream):
print('I read', chunk)
Output:
I read asdfa
I read sdfas
I read df
More information: How the write(), read() and getvalue() methods of Python io.BytesIO work?
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.