Thursday, March 12, 2009

PyPy talk at OpenBossa 09

Yesterday i gave my PyPy status/mobile perspectives at OpenBossa, Nokia's developer conference for embedded platforms in Brazil. Found it a bit of a tough task to do that in 50 minutes. I had some 50, later more developers attending the talk and was happy with the questions and the feedback. Guess it's a good sign if the number of people grows during a talk :) It was the first time i tried to work more with pictures and actually used some devianart photos from Marikaz to mark section transitions. I summarize/highlight some key points here in the post.

After intro and 2.5 compatibility status, i talked about our measurements of PyPy's Python on Nokia's N810 internet tablet. The best bit is that for almost all Python data structures PyPy has smaller memory representations than CPython. Particularly good are class instances which often score at 50% of CPython's sizes. Startup time is also often better and can be improved. On the bad side, PyPy's quite large base interpreter size and its bytecode execution is often worse. In the talk i also outline ideas for "perfect PYC files" for minimizing module import times and maximizing sharing across interpreter processes. I also briefly discussed the PyPy situation with extension modules and regarding C++ libs. Most of these ideas arose from sprint discussions last year. In the morning i also had some good talk with Stefan Seefeld about Boost Python and the new QT4 bindings. Maybe to use Boost Python is also a good opportunity - but PyPy does not currently have a C-level or C++ level API.

In subsequent lunch discussions people agreed that PyPy has three main interesting areas currently:

  • the Python Just-In-Time Compiler
  • a virtualized, sandboxed Python interpreter
  • an efficient Python interpreter for small devices

I think our upcoming 1.1 release will be a good point in time for many people to look some more into PyPy. I hope we are crossing the chasm soon. It's been a while since the project started :) Getting some more sponsoring to sustain and increase our current efforts probably wouldn't hurt.

Now i am off to spend my last day in Recife / Brazil, fly back to Germany in the evening and then spend time on preparing for Pycon 2009. And I guess i am going to enjoy some naturally cold air - at least my two jogging sessions at Brazillian beaches, at a sustained 30 degrees celsius, were tough. I guess i shouldn't complain, though :)

Was great meeting all the brazillian guys and the few women - just had breakfeast with Kate Alhola, kernel hacker and working on the new "Freemantle" graphical platform. Many thanks go to Marcio Marcedo and the Python team at INDT who invited me here. Hope to come again next year and eventually talk more about the Zone VM :)

If you are interested in some more not so pypy-specific bits about the conference and what i experienced, you might head over to my tetamap blog.

holger

10 comments:

Mikko Ohtamaa said...

Hi Holger,

About start up times: We have researched them a lot when developing few applications on Nokia's PyS60.

Our conclusion, for now, is that its imports and module body bytecode execution (loading modules, classes, creating functions) which takes the most of the time during the start up. Unfortunately there is no real way to speed up this process, except lazily trying to load all your code.

We have experienced with unexec() like solution. Unexec() was an old Emacs trick where a.out binary code and data segments are dumped to disk. When the application is loaded for the next time, this dump is just blitted to memory and execution continues. Kind of application level hibernation. You wouldn't actually need to distribute .pyc files at all for embedded devices, you could just give out a target specific binary dump containing ready memory layout.

Of course, it is not so straightforward on modern system with DLLs and other funny pointers. We got some tests working with CPython and PyS60 emulator - but there would be tons of work to make it actually usable (patching all system libs and DLL loads to be suspend friendly).

Some discussion here:

http://mail.python.org/pipermail/python-dev/2008-December/084466.html

Please reply at mikko (at) redinnovation (dot) com if you are interested in to hear more.

Mikko Ohtamaa said...

God I hate too small edit boxes.

Anonymous said...

Do you have a link for those Qt4 bindings?

Paddy3118 said...

On too small edit boxes: I use: It's All Text! for Firefox, together with Vim.

- Paddy.

holger krekel said...

Hi Mikko!

thanks a lot for your comment and the pointer to the python-dev thread!

Like Martin von Loewis i'd be very interested to know more numbers regarding how the time for python imports is usually spent - i'd suspect the major bit comes from unmarshalling and the involved malloc/copying of data work. If that is true then what i presented in the talk as "perfect pyc files" is probably a good idea. It's basically what Martin suggested.

I find the unexec ideas interesting, especially on platforms where fork does not exist. PyPy could probably have a very compact interpreter state representation if we perform garbage collection before writing to disk. When using moving GCs those objects would map very compactly into the oldest-generation memory and thus be mostly avoided by subsequent GC collects.

Of course, there also is time consumed for linking DLLs - only forking is efficient in avoiding this overhead. But it doesn't exist on Symbian, right?

If you have any more info on the exact numbers on import times, i'd be very curious. We might also have some numbers from PyPy - need to check.

I am also available on holger.krekel at gmail com. You are also very welcome to post to pypy-dev (http://codespeak.net/mailman/listinfo/pypy-dev)

cheers,
holger

holger krekel said...

anoymous: pypy does not have qt4 bindings yet.

paddy318: thanks! i'll check this out once i get firefox upgrade on the ubuntu machine i am currently using. (why are we in 2009 still having this concept of "installing" apps/plugins, let alone finding a matching one?)

Anonymous said...

I was referring to this:
"In the morning i also had some good talk with Stefan Seefeld about Boost Python and the new QT4 bindings."

From that it sounds like there are new Qt4 bindings for CPython somewhere, using Boost. I have tried searching, but was not able to find anything.

holger krekel said...

Anonymous, i also only found the 2005 announcement. I mailed Stefan to find out some more. Maybe it's just existing in some developers repository as of yet. I'll let you know if i find out something more actual.

holger krekel said...

Anonymous: ok, seems like recent bindings use SIP, see http://www.riverbankcomputing.co.uk/software/pyqt/download

not sure about the status of boost cpython based qt bindings.
holger

arman said...

I wonder if an unexec like functionality can be developed by developing a mechanism for pickling the current interpreter state (e.g. loaded modules).