Friday, July 13, 2012

Prototype PHP interpreter using the PyPy toolchain - Hippy VM

Hello everyone.

I'm proud to release the result of a Facebook-sponsored study on the feasibility of using the RPython toolchain to produce a PHP interpreter. The rules were simple: two months; one person; get as close to PHP as possible, implementing enough warts and corner cases to be reasonably sure that it answers hard problems in the PHP language. The outcome is called Hippy VM and implements most of the PHP 1.0 language (functions, arrays, ints, floats and strings). This should be considered an alpha release.

The resulting interpreter is obviously incomplete – it does not support all modern PHP constructs (classes are completely unimplemented), builtin functions, grammar productions, web server integration, builtin libraries etc., etc.. It's just complete enough for me to reasonably be able to say that – given some engineering effort – it's possible to provide a rock-solid and fast PHP VM using PyPy technologies.

The result is available in a Bitbucket repo and is released under the MIT license.

Performance

The table below shows a few benchmarks comparing Hippy VM to Zend (a standard PHP interpreter available in Linux distributions) and HipHop VM (a PHP-to-C++ optimizing compiler developed by Facebook). The versions used were Zend 5.3.2 (Zend Engine v2.3.0) and HipHop VM heads/vm-0-ga4fbb08028493df0f5e44f2bf7c042e859e245ab (note that you need to check out the vm branch to get the newest version).

The run was performed on 64-bit Linux running on a Xeon W3580 with 8M of L2 cache, which was otherwise unoccupied.

Unfortunately, I was not able to run it on the JITted version of HHVM, the new effort by Facebook, but people involved with the project told me it's usually slower or comparable with the compiled HipHop. Their JITted VM is still alpha software, so I'll update it as soon as I have the info.

benchmark Zend HipHop VM Hippy VM Hippy / Zend Hippy / HipHop
arr 2.771 0.508+-0% 0.274+-0% 10.1x 1.8x
fannkuch 21.239 7.248+-0% 1.377+-0% 15.4x 5.3x
heapsort 1.739 0.507+-0% 0.192+-0% 9.1x 2.6x
binary_trees 3.223 0.641+-0% 0.460+-0% 7.0x 1.4x
cache_get_scb 3.350 0.614+-0% 0.267+-2% 12.6x 2.3x
fib 2.357 0.497+-0% 0.021+-0% 111.6x 23.5x
fasta 1.499 0.233+-4% 0.177+-0% 8.5x 1.3x

The PyPy compiler toolchain provides a way to implement a dynamic language interpreter in a high-level language called RPython. This is a language which is lower-level than Python, but still higher-level than C or C++: for example, RPython is a garbage-collected language. The killer feature is that the toolchain will generate a JIT for your interpreter which will be able to leverage most of the work that has been done on speeding up Python in the PyPy project. The resulting JIT is generated for your interpreter, and is not Python-specific. This was one of the toolchain's original design decisions – in contrast to e.g. the JVM, which was initially only used to interpret Java and later adjusted to serve as a platform for dynamic languages.

Another important difference is that there is no common bytecode to which you compile both your language and Python, so you don't inherit problems presented when implementing language X on top of, say, Parrot VM or the JVM. The PyPy toolchain does not impose constraints on the semantics of your language, whereas the benefits of the JVM only apply to languages that map well onto Java concepts.

To read more about creating your own interpreters using the PyPy toolchain, read more blog posts or an excellent article by Laurence Tratt.

PHP deviations

The project's biggest deviation from the PHP specification is probably that GC is no longer reference counting. That means that the object finalizer, when implemented, will not be called directly at the moment of object death, but at some later point. There are possible future developments to alleviate that problem, by providing "refcounted" objects when leaving the current scope. Research has to be done in order to achieve that.

Assessment

The RPython toolchain seems to be a cost-effective choice for writing dynamic language VMs. It both provides a fast JIT and gives you access to low-level primitives when you need them. A good example is in the directory hippy/rpython which contains the implementation of an ordered dictionary. An ordered dictionary is not a primitive that RPython provides – it's not necessary for the goal of implementing Python. Now, implementing it on top of a normal dictionary is possible, but inefficient. RPython provides a way to work directly at a lower level, if you desire to do so.

Things that require improvements in RPython:

  • Lack of mutable strings on the RPython level ended up being a problem. I ended up using lists of characters; which are efficient, but inconvenient, since they don't support any string methods.
  • Frame handling is too conservative and too Python-specific, especially around the calls. It's possible to implement less general, but simpler and faster frame handling implementation in RPython.

Status of the implementation

Don't use it! It's a research prototype intended to assess the feasibility of using RPython to create dynamic language VMs. The most notable feature that's missing is reasonable error reporting. That said, I'm confident it implements enough of the PHP language to prove that the full implementation will present the same performance characteristics.

Benchmarks

The benchmarks are a selection of computer language shootout benchmarks, as well as cache_get_scb, which is a part of old Facebook code. All benchmarks other than this one (which is not open source, but definitely the most interesting :( ) are available in the bench directory. The Python program to run them is called runner.py and is in the same directory. It runs them 10 times, cutting off the first 3 runs (to ignore the JIT warm-up time) and averaging the rest. As you can see the standard deviation is fairly minimal for all interpreters and runs; if it's omitted it means it's below 0.5%.

The benchmarks were not selected for their ease of optimization – the optimizations in the interpreter were written specifically for this set of benchmarks. No special JIT optimizations were added, and barring what's mentioned below a vanilla PyPy 1.9 checkout was used for compilation.

So, how fast will my website run if this is completed?

The truth is that I lack the benchmarks to be able to answer that right now. The core of the PHP language is implemented up to the point where I'm confident that the performance will not change as we get more of the PHP going.

How do I run it?

Get a PyPy checkout, apply the diff if you want to squeeze out the last bits of performance and run pypy-checkout/pypy/bin/rpython targethippy.py to get an executable that resembles a PHP interpreter. You can also directly run python targethippy.py file.php, but this will be about 2000x slower.

RPython modifications

There was a modification that I did to the PyPy source code; the diff is available. It's trivial, and should simply be made optional in the RPython JIT generator, but it was easier just to do it, given the very constrained time frame.

  • gen_store_back_in_virtualizable was disabled. This feature is necessary for Python frames but not for PHP frames. PHP frames do not have to be kept alive after we exit a function.

Future

Hippy is a cool prototype that presents a very interesting path towards a fast PHP VM. However, at the moment I have too many other open source commitments to take on the task of completing it in my spare time. I do think that this project has a lot of potential, but I will not commit to any further development at this time. If you send pull requests I'll try to review them. I'm also open to having further development on this project funded, so if you're interested in this project and the potential of a fast PHP interpreter, please get in touch.

Cheers,
fijal

EDIT: Fixed the path to the rpython binary

23 comments:

Anonymous said...

it's cool. Next on the list Javascript to Python/PyPy converter...

Maciej Fijalkowski said...

please read the blog post first. It's *not* PHP to Python converter. There is also a started JS implementation on in https://bitbucket.org/pypy/lang-js, but JS is kind of useless without a browser.

Anonymous said...

JS to pypy would be useful when time comes to running all those node based apps in prod ;)

Also, Java to PyPy would be a cool experiment too - jvm's way too bloated...

Christian Heimes said...

Do I read the numbers correctly? The fibonacci test runs more than 110 times faster in your experimental, 2 months old VM than in the default Zend VM? That's amazing!

It took me a while to figure out the meaning of the numbers. Please add units and explain that small is faster.

Christian

Unknown said...

Nice, Python surprising when

Konstantine Rybnikov said...

Cool. When will your pypy converter convert my c++ programs to python? Can't wait until that happens! Anyway, nice work!

p.s.: sarcasm

Benedikt Morbach said...

Hey there, nice work.

Do you have any numbers or estimates how memory consumption compares?

Ole Laursen said...

I hope you get funding for researching the refcount thing. Being able to predict when something gets whacked is just really convenient and something PyPy Python can benefit from too.

While GC may be more efficient, the unpredictable nature of it do become a problem in production in some cases.

For instance, for a webapp written with Django and CPython, when a request is over I know that the stuff that was allocated is now gone unless I put something in a global data structure. I suspect many applications have similar patterns where you perform a big operation after which it's natural to have a clean up.

Inactive Account said...

Wow, this is wonderful.
You rock.

I surely hope you get funding.

If I didn't live in Brazil, and our currency wasn't so weak, and my income wasn't so low, I would definitely donate some dozens of dollars.

Keep the good work

Tom said...

I would like to see how this compares to the Phalanger project. Which runs PHP in the .NET runtime.

Maciej Fijalkowski said...

About phalanger: the short answer is that I don't have windows and comparisons on mono would be a bit ingenuine. The longer answer is that I don't expect phalanger to particularly excel compared to Zend.

For example compare the performance of IronPython and CPython. The same reasons apply as they do towards JVM or Parrot - this is IMO nto the right way for dynamic lanaguages.

Anonymous said...

Does the Zend test include APC as well? That's the current standard way to run php scripts...

Maciej Fijalkowski said...

Yes, although APC does not change anything in *this* set of benchmarks, precisely because you run everything in-process (within the same interpreter instance even).

Reini Urban said...

Love this effort and esp. the benchmarks! Great work

Referring to your mentioning of JVM and parrot:

You consider as disadvantage to be tied to an existing set of VM opcodes to implement many languages. You were talking about .NET (which had to add Iron-style dynamic reflection later) or the JVM.

parrot already has all the functionality the JVM or .NET was missing and even more (e.g. dynamic types loadable as plugins) and considers it as advantage to share opcodes and bytecode libraries across different languages.

But parrot cannot compete with your speed yet.

SM said...

Very interesting project. It would be nice if you used a recent version of PHP for comparisons - 5.3.2 is over 2 years old and one version behind. Try something like 5.4.4.

Reinis I. said...

> JS is kind of useless without a browser

This would have been more true before Node.js, but now it's false.

Arne Babenhauserheide said...

Wow, 1.5x to 20x faster than a PHP-compiler and 7x to 100x faster than PHP itself… congrats!

Anonymous said...

Offtopic: not trying to sound offensive or pushy, but what happened to numpypy development? I'm regularly checking http://buildbot.pypy.org/numpy-status/latest.html, and it looks like its development is stale for several months.

Maciej Fijalkowski said...

@Anonymous not much. I'll write a non-progress blog post some time soon.

Anonymous said...

@Fijal
Thank you!

Dima Tisnek said...

Awesome proof of concept!

Can you post memory footprint comparison, please?

And perhaps a quick overview what these test cases cover, arithmetic, function call overhead, dynamic language features?

Thanks for your hard work, without likes of you OSS would never exist!

Anonymous said...

Just in case anyone *is* interested in implementing PHP on the Parrot Virtual Machine, you don't have to tie yourself to the PVM bytecodes.

You can write your PHP compiler entirely in NQP (Not Quite Perl) which in turn produces parrot bytecode for you.

This is important for two reasons:

First, NQP is a mid level language, and is relatively easy to write in, and doesn't require you to know anything at all about the PVM.

Second, although NQP *presently* only targets PVM, there's an in-progress backend which targets the Java Virtual Machine! Early benchmarks suggest that it is already faster than perl5, and there are many optimizations and speedups to come.

Thus, if you were to write a PHP compiler in NQP, you could target either the Parrot Virtual machine, or (in the future) the Java virtual machine.

Unknown said...

Just in case anyone *is* interested in implementing PHP on the Parrot Virtual Machine, you don't have to tie yourself to the PVM bytecodes.

You can write your PHP compiler entirely in NQP (Not Quite Perl) which in turn produces parrot bytecode for you.

This is important for two reasons:

First, NQP is a mid level language, and is relatively easy to write in, and doesn't require you to know anything at all about the PVM.

Second, although NQP *presently* only targets PVM, there's an in-progress backend which targets the Java Virtual Machine! Early benchmarks suggest that it is already faster than perl5, and there are many optimizations and speedups to come.

Thus, if you were to write a PHP compiler in NQP, you could target either the Parrot Virtual machine, or (in the future) the Java virtual machine.