memory - Python MemoryError in Scipy Radial Basis Function (scipy.interpolate.rbf)

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I'm trying to interpolate a not-so-large (~10.000 samples) pointcloud representing a 2D surface, using Scipy Radial Basis Function (Rbf). I got some good results, but with my last datasets I'm consistently getting MemoryError , even though the error appears almost instantly during execution (the RAM is obviously not being eaten up).

I decided to hack a copy of the rbf.py file from Scipy, starting by filling it up with some print statements, which have been very useful. By decomposing the _euclidean_norm method line by line, like this:

def _euclidean_norm(self, x1, x2):
    d = x1 - x2
    s = d**2
    su = s.sum(axis=0)
    sq = sqrt(su)
    return sq
I get the error in the first line:
File "C:\MyRBF.py", line 68, in _euclidean_norm
    d = x1 - x2
MemoryError
That norm is called upon an array X1 in the form [[x1, y1], [x2, y2], [x3, y3], ..., [xn, yn]], and X2, which is X1 transposed by the following method inside Rbf class, already hacked by me with debugging purposes:
def _call_norm(self, x1, x2):
    print x1.shape
    print x2.shape
    print
    if len(x1.shape) == 1:
        x1 = x1[newaxis, :]
    if len(x2.shape) == 1:
        x2 = x2[newaxis, :]
    x1 = x1[..., :, newaxis]
    x2 = x2[..., newaxis, :]
    print x1.shape
    print x2.shape
    print
    return self._euclidean_norm(x1, x2)
Please notice that I print the shapes of inputs. With my current dataset, that's what I get (I added the comments manually):
(2, 10744)         ## Input array of 10744 x,y pairs
(2, 10744)         ## The same array, which is to be "reshaped/transposed"
(2, 10744, 1)      ## The first "reshaped/transposed" form of the array
(2, 1, 10744)      ## The second "reshaped/transposed" form of the array
The rationale is, according to documentation, to get "a matrix of the distances from each point in x1 to each point in x2", which mean, since the arrays are the same, a matrix of distances between every pair of the entry array (which contains the X and Y dimensions).
I tested the operation manually with much smaller arrays (shapes (2,5,1) and (2,1,5), for example) and the subtraction works.
How can I find out why it is not working with my dataset? Is there any other obvious error? Should I check some form of ill-conditioning of my dataset, or perform some pre-processing on it? I think it is well-conditioned, since I can plot it in 3D and the cloudpoint is visually very well formed.
Any help would be very much appreciated.
Thanks for reading.
                I tried downsampling the pointcloud, and it worked (25% of the points, via stepped slice). Anyway, besides wanting to use all points, it would be useful to know why I get the error...
– heltonbiker
                Aug 8, 2012 at 14:16
Your dataset should be fine: the error appears because you don't have enough RAM to store the result of the subtraction.
According to the broadcasting rules, the result will have shape
 (2, 10744,     1)
-(2,     1, 10744)
------------------
 (2, 10744, 10744)
Assuming these are arrays of dtype float64, you need 2*10744**2*8 = 1.72 GiB of free memory. If there isn't enough free memory, numpy won't be able to allocate the output array and will immediately fail with the error you see.
                Two "more" doubts: 1) When I run Task Manager > Performance (Windows), there never seems to happen some massive memory allocation when I run the script with the same dataset downsample to, say, one third of the size; 2) For half the size, I still get MemoryError, but then on the su = s.sum(axis=0). How could I calculate needed available memory to perform array.sum()? 3) At last, would it be possible, within reasonable speeds, to perform those operations in a more lazy way?  Thank you so much for your help!
– heltonbiker
                Aug 8, 2012 at 19:26
                1) If the size is one third, the needed memory is 9x less (because memory scales with the square of the dataset size) 2) Maybe you can do del d before su = s.sum(...), or do all calculations in one line to avoid intermediate results. 3) If your data does not fit in RAM, I think you can still use np.memap or maybe PyTables, but I'm not familiar with them.
– jorgeca
                Aug 8, 2012 at 20:18
        Thanks for contributing an answer to Stack Overflow!
Please be sure to answer the question. Provide details and share your research!
But avoid …
Asking for help, clarification, or responding to other answers.
Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.