Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

works very slow
I wrote this module in C. I know C language very badly, before I on it wrote nothing.
In a variant

PyArg_ParseTuple (args, "s", &str))

everything works as expected, but I need to use instead of s s* because elements can contain embeded null, but if I change s to s* when calling python crash

PyArg_ParseTuple (args, "s*", &str)) // crash

Maybe some beginner like me want to use my example as a start to write something of his own, so bring all the information to be used in this example on Windows.
Parsing arguments and building values on page http://docs.python.org/dev/c-api/arg.html

test_xor.c

#include <Python.h>
static PyObject* fast_xor(PyObject* self, PyObject* args)
    const char* str ;
    int i;
    if (!PyArg_ParseTuple(args, "s", &str))
        return NULL;
    for(i=0;i<sizeof(str);i++) {str[i]^=55;};
    return Py_BuildValue("s", str);
static PyMethodDef fastxorMethods[] =
     {"fast_xor", fast_xor, METH_VARARGS, "fast_xor desc"},
     {NULL, NULL, 0, NULL}
PyMODINIT_FUNC
initfastxor(void)
     (void) Py_InitModule("fastxor", fastxorMethods);

test_xor.py

import fastxor
a=fastxor.fast_xor("World") # it works with s instead s*
print a
a=fastxor.fast_xor("Wo\0rld") # It does not work with s instead s*

compile.bat

rem use http://bellard.org/tcc/
tiny_impdef.exe C:\Python26\python26.dll
tcc -shared test_xor.c python26.def -IC:\Python26\include -LC:\Python26\libs -ofastxor.pyd
test_xor.py 
                Except that as it appears to be written, it's writing back to the array. Though there is an error in that too: str1=str1[i] ^ 55 should be str1[i]=str1[i] ^ 55, or more concisely str[1] ^= 55.
– Henry Gomersall
                Mar 17, 2013 at 11:18

You don't need build an extension module to do this quickly, you can use NumPy. But for your question, you need some c code like this:

#include <Python.h>
#include <stdlib.h> 
static PyObject * fast_xor(PyObject* self, PyObject* args)
    const char* str;
    char * buf;
    Py_ssize_t count;
    PyObject * result;
    int i;
    if (!PyArg_ParseTuple(args, "s#", &str, &count))
        return NULL;
    buf = (char *)malloc(count);
    for(i=0;i<count;i++)
        buf[i]=str[i] ^ 55;
    result = Py_BuildValue("s#", buf, count);
    free(buf);
    return result;

You can't change the content of string object, because string in Python is immutable. You can use "s#" to get the char * pointer and the buffer length.

If you can use NumPy:

In [1]: import fastxor
In [2]: a = "abcdsafasf12q423\0sdfasdf"
In [3]: fastxor.fast_xor(a)
Out[3]: 'VUTSDVQVDQ\x06\x05F\x03\x05\x047DSQVDSQ'
In [5]: import numpy as np
In [6]: (np.frombuffer(a, np.int8)^55).tostring()
Out[6]: 'VUTSDVQVDQ\x06\x05F\x03\x05\x047DSQVDSQ'
In [7]: a = a*10000
In [8]: %timeit fastxor.fast_xor(a)
1000 loops, best of 3: 877 us per loop
In [15]: %timeit (np.frombuffer(a, np.int8)^55).tostring()
1000 loops, best of 3: 1.15 ms per loop
                Yes, it works  Yet, how to make a similar function.      b=bytearray('World')      def change(b):          for i in range(len(b)): b[i]=b[i]^55M
– Arty
                Mar 17, 2013 at 11:24
                You can use bytearray API: docs.python.org/2/c-api/bytearray.html, to get the length and the char * buffer pointer.
– HYRY
                Mar 17, 2013 at 12:08
                Sorted out with s*: declare must Py_buffer str, parsing must PyArg_ParseTuple(args, "s*", &str), use: str.len - length of buffer, str.buf - data. Thank you all.
– Arty
                Mar 17, 2013 at 13:50
                Worked like a charm for me except I needed to change the output type to 'y#' due to a UnicodeDecodeError.  result = Py_BuildValue("y#", buf, count);
– David W.
                Feb 6, 2021 at 0:35

An alternative approach is to use PyObject_GetBuffer. The module below defines fast_xor for any object that supports the buffer protocol, and fast_xor_inplace for objects that have writable buffers, such as bytearray. This version returns None. I also added a 2nd unsigned charargument with a default value of 55.

Example:

>>> s = 'abc'
>>> b = bytearray(s)
>>> fast_xor(s), fast_xor(s, 0x20)
('VUT', 'ABC')
>>> fast_xor_inplace(b, 0x20)
bytearray(b'ABC')
>>> fast_xor_inplace(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
BufferError: Object is not writable.
>>> fast_xor(b, 256)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: unsigned byte integer is greater than maximum

Source:

#include <Python.h>
static PyObject *fast_xor_inplace(PyObject *self, PyObject *args)
    PyObject *arg1;
    unsigned char arg2 = 55;
    Py_buffer buffer;
    char *buf;
    int i;
    if (!PyArg_ParseTuple(args, "O|b:fast_xor_inplace", &arg1, &arg2))
        return NULL;
    if (PyObject_GetBuffer(arg1, &buffer, PyBUF_WRITABLE) < 0)
        return NULL;
    buf = buffer.buf;
    for(i=0; i < buffer.len; i++)
        buf[i] ^= arg2;
    PyBuffer_Release(&buffer);
    Py_INCREF(Py_None);
    return Py_None;
static PyObject *fast_xor(PyObject *self, PyObject *args)
    PyObject *arg1;
    unsigned char arg2 = 55;
    PyObject *result;
    Py_buffer buffer;
    char *buf, *str;
    int i;
    if (!PyArg_ParseTuple(args, "O|b:fast_xor", &arg1, &arg2))
        return NULL;
    if (PyObject_GetBuffer(arg1, &buffer, PyBUF_SIMPLE) < 0)
        return NULL;
    result = PyString_FromStringAndSize(NULL, buffer.len);
    if (result == NULL)
        return NULL;
    buf = buffer.buf;
    str = PyString_AS_STRING(result);
    for(i=0; i < buffer.len; i++)
        str[i] = buf[i] ^ arg2;
    PyBuffer_Release(&buffer);
    return result;
static PyMethodDef fastxorMethods[] =
     {"fast_xor", fast_xor, METH_VARARGS, "fast xor"},
     {"fast_xor_inplace", fast_xor_inplace, METH_VARARGS, "fast inplace xor"},
     {NULL, NULL, 0, NULL}
PyMODINIT_FUNC
initfastxor(void)
    Py_InitModule3("fastxor", fastxorMethods, "fast xor functions");
                Yes, it works perfectly, thanks. I begin to understand how use Py_buffer. But strangely, if i run examle as example.py all work fine. If i run step-by-step in console - step fastxor.fast_xor_inplace(b, 0x20) cause crash python.
– Arty
                Mar 17, 2013 at 15:34
                @Arty: If you use "s*", you'll have to check buffer.readonly and raise an exception manually. Otherwise you'll mutate an 'immutable' string.
– Eryk Sun
                Mar 17, 2013 at 16:11
                Sorry, i only beginner C programmer. Probably not be able to see. I only see event about crash in eventlog and message box.
– Arty
                Mar 17, 2013 at 16:38

This can be done very quickly using numpy. You're unlikely to get substantially faster hand-rolling your own xor routine in C:

In [1]: import numpy
In [2]: data = numpy.uint8(numpy.random.randint(0, 256, 10000))
In [3]: timeit xor_data = numpy.bitwise_xor(data, 55)
100000 loops, best of 3: 17.4 us per loop

If you're using a big dataset (say 100 million points), it's favourably comparable to the times you have quoted for your code:

In [12]: data = numpy.uint8(numpy.random.randint(0, 256, 100000000))
In [13]: timeit xor_data = numpy.bitwise_xor(data, 55)
1 loops, best of 3: 198 ms per loop
                I do not know in advance the size of the bytearray that will be passed. I have to get the size with a PyArg_ParseTuple
– Arty
                Mar 17, 2013 at 11:07

Do not use a for-loop. Use a list comprehension instead, they are much faster:

In [1]: import random
In [2]: t = bytearray([random.randint(0,255) for i in xrange(10000)])
In [3]: u = bytearray([b^55 for b in t])

This is very fast:

In [11]: %timeit u = bytearray([b^55 for b in t])
1000 loops, best of 3: 1.36 ms per loop

That's not really slow. For 1 MB (10**6 bytes) it takes around 130 ms.

Of course using numpy as Henry Gomersall answered is the better solution.

My datasize 10-100 MBytes, time for you samples in my case 3-30 sec (with hints liked struct.unpack and zlib 0.6-6 seconds). in my example on C i reached time 0.06-0.6 seconds. It's worth it. – Arty Mar 17, 2013 at 11:02 My original timeit included the setup every time. See revised answer. 10 MB would take 1.3 s. Is that still worth it? – Roland Smith Mar 17, 2013 at 11:04 You can remove the inner lists and it will be even faster. In [4]: %timeit u = bytearray(b^55 for b in bytearray(i for i in range(10000) if i < 255)) 1000 loops, best of 3: 605 us per loop – Burhan Khalid Mar 17, 2013 at 11:25 @BurhanKhalid: your inner bytearray is only 255 bytes, not 10000. I'm guessing that is why it is faster. I've tested with and without the list, and with the list is faster. – Roland Smith Mar 17, 2013 at 11:28 @Roland Smith, your version is the fastest of the simple.But, in sample with C code result 0.006 sec/1MBytes. This is better than your result 0.13 sec/1MBytes. In pure python my best result 0.06 sec/1MBytes with struct.unpack – Arty Mar 17, 2013 at 12:22

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.