python - Reading binary data from stdin

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

Is it possible to read stdin as binary data in Python 2.6? If so, how?

I see in the Python 3.1 documentation that this is fairly simple, but the facilities for doing this in 2.6 don't seem to be there.

If the methods described in 3.1 aren't available, is there a way to close stdin and reopen in in binary mode?

Just to be clear, I am using 'type' in a MS-DOS shell to pipe the contents of a binary file to my python code. This should be the equivalent of a Unix 'cat' command, as far as I understand. But when I test this out, I always get one byte less than the expected file size.

The reason I'm going the Java/JAR/Jython route is because one of my main external libraries is only available as a Java JAR. But unfortunately, I had started my work as Python. It might have been easier to convert my code over to Java a while ago, but since this stuff was all supposed to be compatible, I figured I would try trucking through it and prove it could be done.

In case anyone was wondering, this is also related to this question I asked a few days ago.

Some of was answered in this question .

So I'll try to update my original question with some notes on what I have figured out so far.

The standard streams are in text mode by default. To write or read binary data to these, use the underlying binary buffer. For example, to write bytes to stdout, use sys.stdout.buffer.write(b'abc') .

But, as in the accepted answer, invoking python with a -u is another option which forces stdin, stdout and stderr to be totally unbuffered. See the python(1) manpage for details.

See the documentation on io for more information on text buffering, and use sys.stdin.detach() to disable buffering from within Python.

I've tried

-u

with Python v3.2.5 but it did nothing useful. But using


    sys.stdout.buffer

works pretty well though on Python 2.7.8 there is no such feature. – ony Sep 2, 2014 at 7:02

Here is the final cut for Linux/Windows Python 2/3 compatible code to read data from stdin without corruption:

import sys
PY3K = sys.version_info >= (3, 0)
if PY3K:
    source = sys.stdin.buffer
else:
    # Python 2 on Windows opens sys.stdin in text mode, and
    # binary data that read from it becomes corrupted on \r\n
    if sys.platform == "win32":
        # set sys.stdin to binary mode
        import os, msvcrt
        msvcrt.setmode(sys.stdin.fileno(), os.O_BINARY)
    source = sys.stdin
b = source.read()
Use the -u command line switch to force Python 2 to treat stdin, stdout and stderr as binary unbuffered streams.
C:> type mydoc.txt | python.exe -u myscript.py
                I have tested this with 'type' and it appears to work.  That is, if I leave out the -u flag, I get one fewer character per line.
– Dan Menes
                May 17, 2010 at 19:14
                According to docs, setting the PYTHONUNBUFFERED environment variable will have the same effect.  Not sure if that helps.
– Dan Menes
                May 17, 2010 at 19:32
                Even easier, it appears that all you need to do is:  sys.stdin = os.fdopen(sys.stdin.fileno(), 'rb', 0)  That will reopen the fd in unbuffered 'binary' mode.
– danielshiplett
                May 17, 2010 at 19:57
                @thebeav: Oddly enough that doesn't work on my system.  I don't know if that's because I'm using CPython instead of Jython, or if it's because I'm running Windows XP Pro, and "type" behaves differently, or its because there is a magnetic anomaly in the Manassas area that makes computers do different things.  FWIW, I tried a number of ways to get Python to change the file mode after the interpreter had started, including accessing the C runtime's "setmode" function via ctypes.  Nothing works for me.
– Dan Menes
                May 17, 2010 at 20:27
                Uh-oh.  I smell a portability issue.  Thanks for the info.  I guess I'm going to have to do some fairly rigorous testing on multiple platforms.  I hope this doesn't have to do with the JVM in use.
– danielshiplett
                May 18, 2010 at 13:02
If you still need this...
This simple test i've used to read binary file that contains 0x1A character in between
import os, sys, msvcrt
msvcrt.setmode (sys.stdin.fileno(), os.O_BINARY)
s = sys.stdin.read()
print len (s)
My test file data was:
0x23, 0x1A, 0x45
Without setting stdin to binary mode this test prints 1 as soon it treats 0x1A as EOF.
Of course it works on windows only, because depends on msvcrt module.
                But Windows is the only system where most people will run into a problem, so this should be an acceptable solution.
– Mark Ransom
                Jan 22, 2014 at 5:04
                This is the correct solution for Python 2 to retrieve the raw bytes from stdin on Windows. On Unix, there is no difference between binary and normal mode. See this thread: code.activestate.com/lists/python-list/20426 (re-opening stdin in raw (binary) mode?)
– Dr. Jan-Philip Gehrcke
                Feb 20, 2014 at 15:29
                I was getting a ValueError: insecure string pickle exception on Windows when trying to unpickle data that had been written to stdout in one process which was being piped into another. The solution turned out to be adding a msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY) in the process that wrote the data.
– martineau
                Apr 11, 2016 at 18:52
                If I then call sys.stdin.read() with no parameter, it should read all the binary data that was piped in, correct?  How then do I determine the length correctly?  len(data) returns the incorrect value if the last byte of the data was a zero.  How do you check and correct for this situation?
– danielshiplett
                May 17, 2010 at 17:15
                len counts the \x00 characters in the string. Python does not have null terminated strings. len("Hello\x00") == 6
– Yann Ramin
                May 17, 2010 at 17:27
                I wonder then if it might be the 'type' command from the MS-DOS shell that is causing the loss of the final byte?  I guess I will have to test the equivalent on Linux.  Thanks.
– danielshiplett
                May 17, 2010 at 17:52
                I think this answer misses the point of the question: if the stream is in "text" mode, the results from read() might be different than if the stream is in "binary" mode.
– Brent Bradburn
                Feb 5, 2013 at 19:34
                It might corrupt input stream on Windows e.g., '\r\n' -> '\n'. Also, on Python 3 sys.stdin.read() returns Unicode strings e.g., b'\xf0\x9f\x96\x96' -> '\U0001f596' (4 bytes -> 1 chararcter). It is undesirable behaviour if input is not text.
– jfs
                Nov 16, 2013 at 1:01
        Thanks for contributing an answer to Stack Overflow!
Please be sure to answer the question. Provide details and share your research!
But avoid …
Asking for help, clarification, or responding to other answers.
Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.