Thursday, July 7, 2011

Realtime image processing in Python

Image processing is notoriously a CPU intensive task. To do it in realtime, you need to implement your algorithm in a fast language, hence trying to do it in Python is foolish: Python is clearly not fast enough for this task. Is it? :-)
Actually, it turns out that the PyPy JIT compiler produces code which is fast enough to do realtime video processing using two simple algorithms implemented by Håkan Ardö.
sobel.py implements a classical way of locating edges in images, the Sobel operator. It is an approximation of the magnitude of the image gradient. The processing time is spend on two convolutions between the image and 3x3-kernels.
magnify.py implements a pixel coordinate transformation that rearranges the pixels in the image to form a magnifying effect in the center. It consists of a single loop over the pixels in the output image copying pixels from the input image.
You can try by yourself by downloading the appropriate demo:
To run the demo, you need to have mplayer installed on your system. The demo has been tested only on linux, it might (or not) work also on other systems:
$ pypy pypy-image-demo/sobel.py

$ pypy pypy-image-demo/magnify.py
By default, the two demos uses an example AVI file. To have more fun, you can use your webcam by passing the appropriate mplayer parameters to the scripts, e.g:
$ pypy demo/sobel.py tv://
By default magnify.py uses nearest-neighbor interpolation. By adding the option -b, bilinear interpolation will be used instead, which gives smoother result:
$ pypy demo/magnify.py -b
There is only a single implementation of the algorithm in magnify.py. The two different interpolation methods are implemented by subclassing the class used to represent images and embed the interpolation within the pixel access method. PyPy is able to achieve good performance with this kind of abstractions because it can inline the pixel access method and specialize the implementation of the algorithm. In C++ that kind of pixel access method would be virtual and you'll need to use templates to get the same effect without incurring in runtime overhead.
The video above shows PyPy and CPython running sobel.py side by side (PyPy taking input from the webcam, CPython from the test file). Alternatively, to have a feeling on how much PyPy is faster than CPython, try to run the demo with the latter. These are the the average fps (frames per second) that I get on my machine (Ubuntu 64 bit, Intel i7 920, 4GB RAM) when processing the default test.avi video and using the prebuilt PyPy binary found in the full tarball alinked above. For sobel.py:
  • PyPy: ~47.23 fps
  • CPython: ~0.08 fps
For magnify.py:
  • PyPy: ~26.92 fps
  • CPython: ~1.78 fps
This means that on sobel.py, PyPy is 590 times faster. On magnify.py the difference is much less evident and the speedup is "only" 15x.
It must be noted that this is an extreme example of what PyPy can do. In particular, you cannot expect (yet :-)) PyPy to be fast enough to run an arbitrary video processing algorithm in real time, but the demo still proves that PyPy has the potential to get there.

25 comments:

  1. Pypy is awesome!

    ReplyDelete
  2. I have a n00b problem: On Mac OS X 10.5.8, the precompiled pypy binary crashes with this message:
    dyld: Library not loaded: /usr/lib/libssl.0.9.8.dylib

    What's up with this? Thanks, and sorry for being offtopic.

    ReplyDelete
  3. I saw this demo recently when Dan Roberts presented at Baypiggies. We broke into spontaneous applause when the pypy runtime ran at a watchable speed after cpython ran at less than 1 frame/second. Very impressive!

    ReplyDelete
  4. Anonymous, can you read?

    "prebuilt PyPy binaries for linux 32 and 64 bits"
    "The demo has been tested only on linux, it might (or not) work also on other systems"

    Mac OS X is not Linux.

    ReplyDelete
  5. Perhaps add a comment to sobel.py explaining what "pypyjit.set_param(trace_limit=200000)" does?

    ReplyDelete
  6. The only chamge I'd like to see in this project is its name... Trying to gather news from twitter for example, makes me search amongst thousands of comments in japanese (pypy means "boobies" in japanese), other incomprehensible comments in malay and hundreds of music fans of Look-Ka PYPY (WTF??)

    ReplyDelete
  7. Other Anonymous: Yes, I can read. I should have given a bit more context, but I was offtopic anyway. My goal was not running the demo, my goal was running pypy. I used the OS X binary from pypy.org. For those who are really good at reading, this was probably clear from the fact that my binary only crashed at library loading time.

    ReplyDelete
  8. @Anonymous: most probably, the prebuilt PyPy for Mac Os X was built on a system different (older?) than yours.

    For a quick workaround, you can try to do "ln -s /usr/lib/libssl-XXX.dylib /usr/lib/libssl.0.9.8.dylib". This should at least make it working, but of course it might break in case you actually use libssl.

    The proper fix is to recompile PyPy by yourself.

    ReplyDelete
  9. @schmichael

    to avoid the potential problem of infinite tracing, the JIT bails out if it traces "too much", depending on the trace_limit.
    In this case, the default trace_limit is not enough to fully optimize the whole algorithm, hence we need to help the JIT by telling it to trace a bit more than usual.

    I agree that having to mess up with the internal parameters of the JIT is suboptimal. I plan to address this issue in the next weeks.

    ReplyDelete
  10. How does it perform against python-opencv?

    ReplyDelete
  11. Antonio: Thanks for the quick reply. Unfortunately pypy can't be misled with the symlink hack: "Reason: Incompatible library version: pypy requires version 0.9.8 or later, but libssl.0.9.8.dylib provides version 0.9.7"

    It seem like the prebuilt was created on a 10.6, and it does not work on vanilla 10.5 systems. Not a big deal, but is good to know.

    ReplyDelete
  12. Thanks for posting this. pypy is great. I'm trying to figure out how to write modules in RPython. I was sad that I missed the Baypiggies presentation.

    ReplyDelete
  13. Hello,

    it's lovely that pypy can do this. This result is amazing, wonderful, and is very kittens. pypy is fast at running python code (*happy dance*).

    But.

    It also makes kittens cry when you compare to CPython in such a way.

    The reality is that CPython users would do this using a library like numpy, opencv, pygame, scipy, pyopengl, freej (the list of real time video processing python libraries is very large, so I won't list them all here).

    Of course python can do this task well, and has for more than 10 years.

    This code does not take advantage of vectorization through efficient SIMD, multiple cores or graphics hardware, and isn't careful with reusing memory - so is not within an order of magnitude of the speed of CPython code with libraries doing real time video processing.

    Anyone within the field would ask about using these features.

    Another question they would ask is about pauses. How does the JIT affect pauses in animation? What are the rules for when the JIT warms up, and how can you tell when the code will start running fast? How does the GC affect pauses? If there is a way to turn off the GC, or reuse memory in some way such that the GC won't cause the program to fail(Remember that in realtime a pause is a program fail). Does the GC pool memory of similar size objects automatically? Does the GC work well with 256MB-1GB-16GB sized objects? In a 16GB system, can you use 15GB of objects, and then delete those objects to then use another 15GB of different objects? Or will the program swap, or fragment memory causing pauses?

    Please don't make kittens cry. Be realistic with CPython comparisons.


    At the moment the python implementation is not as elegant as a vector style implementation. A numpy/matlab/CUDA/OpenCL approach looks really nice for this type of code. One speed up might be to reuse memory, or act in place where possible. For example, not copying the image... unless the GC magically takes care of that for you.

    ReplyDelete
  14. @illume:More or less everyone knows that you can speed up your code by writing or using an extension library. Unfortunately this introduces a dependency on the library (for instance libssl mentioned in the comment thread) and it usually increases the complexity of your code.

    Using PyPy you can solve computationally intensive problems in plain Python. Writing in Python saves development time. This is what the comparison is all about.

    ReplyDelete
  15. hi @jacob: below is code which runs either multi core, vectorised SIMD, and on a GPU if you like. You'll notice that it is way shorter and more elegant than the 'pure python' code.

    def sobelEdgeDetect(im=DImage, p=Position):
    ....wX = outerproduct([1,2,1],[-1,0,1])
    ....wY = transpose(wX)

    ....Gx = convolve(wX,im,p)
    ....Gy = convolve(wY,im,p)

    ....return sqrt(Gx**2 + Gy**2)

    If pypy is 5x slower than C, and SIMD is 5x faster than C... and using multiple cores is 8x faster than a single core you can see this python code is (5 * 5 * 8) 200x faster than the pypy code. This is just comparing CPU based code. Obviously GPU code for real time image processing is very fast compared to CPU based code.

    Things like numpy, pyopengl etc come packaged with various OSes - but chosing those dependencies compared to depending on pypy is a separate issue I guess (but many cpython packaged libraries are packaged for more platforms than pypy).

    Of course using tested, and debugged existing code written in python will save you development time: for example using sobel written with the scipy library:
    http://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.filters.sobel.html

    The fact is CPython is fast enough, more elegant, and will save you time for realtime image processing - unless you ignore the reality that people use CPython libraries for these tasks.

    Finally the given code does not prove that the frames are all processed in realtime. They give an average time over all of the frames. Realtime video requires that you meet your target speed for every frame. It would need to be extended to measure each frame to make sure that each frame is within the required time budget.

    ReplyDelete
  16. @illume: I think you completely missed the point of the blog post. This is not about "you should use pypy to do video processing", it's about "pypy runs pure python code very fast".

    ReplyDelete
  17. @Antonio Cuni, I'm saying the post reads like cpython can not do "realtime image processing in python" and that pypy can.

    ReplyDelete
  18. @illume:
    This example shows pure python code and compares its execution time in cpython and pypy. Nothing else. Writing graphics code in pure python that runs not dreadfully slow was to my knowledge never before shown.
    If enough people understand the potential of this technique and put their time into it, we will hopefully come closer to your (5 * 5 * 8) acceleration in pypy, too.
    I will for sure work on this.

    ReplyDelete
  19. SIMD instructions and multi core support is something PyPy has potential to support, given time and funding.

    ReplyDelete
  20. The typical optimization path here would be implementing the necessary numpy array operations for the algorithms described. I wonder how a proper numpy implementation would compare.

    ReplyDelete
  21. I think you are still missing the point of the post. It was not "use pure Python to write your video processing algos". That's of course nonsense, given the amount and quality of existing C extension modules to do that.

    The point is that when you want to experiment with writing a new algorithm of any kind, it is now possible to do it in pure Python instead of, say, C code. If later your project needs to move past the experimentation phase, you will have to decide if you want to keep that Python code, rewrite it in C, or (if applicable) use SIMD instructions from Python or from C, or whatever.

    The real point of this demo is to show that PyPy makes Python fast enough as an early experimentation platform for almost any kind of algorithm. If you can write in Python instead of in C, you'll save 50% of your time (random estimate); and then for the 5% of projects that go past the experimentation phase and where Python is not enough (other random estimate), spend more time learning other techniques and using them. The result is still in your favor, and it's only going to be more so as PyPy continues to improve.

    ReplyDelete
  22. I was hoping to experiment with this amazing demo on my Windows-based computers. Any advice for how I would start making the required changes?

    Jacob

    ReplyDelete
  23. Unfortunately the server died :( I'm not sure where exactly are packaged demos, but they can be run from:

    https://bitbucket.org/pypy/extradoc/src/extradoc/talk/iwtc11/benchmarks/image

    ReplyDelete

  24. The python code for this seems to be now here:
    https://bitbucket.org/pypy/extradoc/src/talk/dls2012/demo

    ReplyDelete
  25. The scripts can be found here:

    https://bitbucket.org/pypy/extradoc/src/153804ce4fc3/talk/dls2012/demo

    ReplyDelete

See also PyPy's IRC channel: #pypy at freenode.net, or the pypy-dev mailing list.
If the blog post is old, it is pointless to ask questions here about it---you're unlikely to get an answer.