More progress was made on the NumPy front in the past month. On the compatibility front, we now pass ~130 more tests from NumPy's suite since the end of January. Currently, we pass 2336 tests out of 3265 tests run, with many of the failures representing portions of NumPy that we don't plan to implement in the near future (object dtypes, unicode, etc). There are still some failures that do represent issues, such as special indexing cases and failures to respect subclassed ndarrays in return values, which we do plan to resolve. There are also some unimplemented components and ufuncs remaining which we hope to implement, such as nditer and mtrand. Overall, the most common array functionality should be working.
Additionally, I began to take a look at some of the loops generated by our code. One widely used loop is dot, and we were running about 5x slower than NumPy's C version. I was able to optimize the dot loop and also the general array iterator to get us to ~1.5x NumPy C time on dot operations of various sizes. Further progress in this area could be made by using CFFI to tie into BLAS libraries, when available. Also, work remains in examining traces generated for our other loops and checking for potential optimizations.
To try out PyPy + NumPy, grab a nightly PyPy and install our NumPy fork. Feel free to report comments/issues to IRC, our mailing list, or bug tracker. Thanks to the contributors to the NumPy on PyPy proposal for supporting this work.
Cheers,
Brian
Thanks! It would be easier to repost this if the title contained pypy: "numpy in pypy - progress in February"
ReplyDeleteIt would be great if the first performance optimizations where actually wrapper to BLAS, there is outstanding BSD license BLAS at https://github.com/xianyi/OpenBLAS
ReplyDeleteI believe the "performance optimizations" mentioned in the blog post are unrelated to BLAS. BLAS is about calling an external library. You can't optimize that, you just merely call it. The performance optimizations are about things like computing the matrix "a + b + c", which can be done without computing the intermediate result "a + b".
ReplyDeleteArmin, I agree with you. What I'm trying to say is that maybe to make the BLAS interface is going to be very easy, give great performance and people will use it most of the time if you bundle it.
ReplyDelete