Saturday, January 26, 2013

NumPyPy 2013 Developer Position

Introduction

Proposed herein is a part-time fellowship for developing NumPy in PyPy. The work will initially consist of 100 hours with the possibility of extension, until the funds run out. Development and improvement of PyPy's NumPyPy (as with most Open Source and Free Software) is done as a collaborative process between volunteer, paid, and academic contributors. Due to a successful funding drive but a lack of contributors willing to work directly for PyPy, we find ourselves in the enviable situation of being able to offer this position.

Background

PyPy's developers make all PyPy software available to the public without charge, under PyPy's Open Source copyright license, the permissive MIT License. PyPy's license assures that PyPy is equally available to everyone freely on terms that allow both non-commercial and commercial activity. This license allows for academics, for-profit software developers, volunteers and enthusiasts alike to collaborate together to make a better Python implementation for everyone.

NumPy support for PyPy is licensed similarly, and therefore NumPy in PyPy support can directly help researchers and developers who seek to do numeric computing but want an easier programming language to use than Fortan or C, which is typically used for these applications. Being licensed freely to the general public means that opportunities to use, improve and learn about how NumPy in PyPy works itself will be generally available to everyone.

The Need for a Part-Time Developer

NumPy project in PyPy has seen some slow, but steady progress since we started working about a year ago. On one hand, it's actually impressive what we could deliver with the effort undertaken, on the other hand, we would like to see the development accelerated.

PyPy has strict coding, testing, documentation, and review standards, which ensures excellent code quality, continually improving documentation and code test coverage, and minimal regressions. A part-time developer will be able to bring us closer to the goal of full numpy-api implementation and speed improvements.

Work Plan

The current proposal is split into two parts:

  • Compatibility:

    This part covers the core NumPy Python API. We'll implement most NumPy APIs that are officially documented and we'll pass most of NumPy's tests that cover documented APIs and are not implementation details. Specifically, we don't plan to:

    • implement NumPy's C API
    • implement other scientific libraries, like SciPy, matplotlib or biopython
    • implement details that are otherwise agreed by consensus to not have a place in PyPy's implementation of NumPy or agreed with NumPy community to be implementation details
  • Speed:

    This part will cover significant speed improvements in the JIT that would make numeric computations faster. This includes, but is not necesarilly limited to:

    • write a set of benchmarks covering various use cases
    • teaching the JIT backend (or multiple backends) how to deal with vector operations, like SSE
    • experiments with automatic parallelization using multiple threads, akin to numexpr
    • improving the JIT register allocator that will make a difference, especially for tight loops

    As with all speed improvements, it's relatively hard to predict exactly how it'll cope, however we expect the results to be withing an order of magnitude of handwritten C equivalent.

Position Candidate

We would like people who are proficient in NumPy and PyPy (but don't have to be core developers of either) to step up. The developer selection will be done by consensus of PyPy core developers and consulted with the Software Freedom Conservancy for lack of conflict of interest. The main criterium will be past contributions to the PyPy project, but they don't have to be significant in size.

A candidate for the Developer position will demonstrate the following:

  • The ability to write clear, stable, suitable and tested code
  • The ability to understand and extend the JIT capabilities used in NumPyPy.
  • A positive presence in PyPy's online community on IRC and the mailing list.

Ideally the Developer will also:

  • Have familiarity with the infrastructure of the PyPy project (including bug tracker and buildbot).
  • Have Worked to provide education or outreach on PyPy in other forums such as workshops, conferences, and user groups.

Conservancy and PyPy are excited to announce the Developer Position. Renumeration for the position will be at the rate of 60 USD per hour, through the Software Freedom Conservancy.

PyPy community is promising to provide necessary guidance and help into the current codebase, however we expect a successful candidate to be able to review code and incorporate external patches within two months of the starting date of the contract.

Candidates should submit their proposal (including their CV) to:

pypy-z@python.org

The deadline for this initial round of proposals is February 1, 2013.


11 comments:

  1. I was wondering, why is PyPy so eager to support NumPy of all things? Surely there are things more interesting to a general python/pypy user base. Can someone clarify that for me?

    ReplyDelete
  2. There was a numpy fundraiser due to popular demand. Feel free to suggest a different fundraiser if you want something else. I would be willing to even do a survey.

    ReplyDelete
  3. The thing is, the most interesting use of Python is in science, IMHO at least. And absolute majority of python scientific libraries use numpy as base. So, it would be awesome to have fast and robust numpy compatible library running on pypy.

    ReplyDelete
  4. The deadline seems too tight: it's next Friday.

    ReplyDelete
  5. It's been said before but as a long time NumPy and SciPy user, please please please don't call this project NumPy. It's great for PyPy to have an nd-array lib and for sure NumPy has some of the best semantics and user API for that so by all means make it compatible, but giving it the same name just makes tremendous confusion for users. For scientific users without the C-API which allows most of the widely used scientific extensions it is simply not "numpy".

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. @201301261931

    As NumPyPy intends to implement NumPy APIs, as a non-contributor, I feel like NumPyPy is a good name.

    So then the package names would be:

    * http://pypi.python.org/pypi/numpy
    * http://pypi.python.org/pypi/numpypy

    @201301261237

    IMHO, this is not the forum for discussing what sort of pony you would like?

    ReplyDelete
  8. FWIW I think that numpypy to work is hugely important for the acceptance of pypy. Simple things like using matplotlib are crucial to lots of people who aren't using much of the rest of scipy, for example.

    ReplyDelete
  9. You can post it on http://jobs.pythonweekly.com/ and it will be included in Python Weekly newsletter too.

    ReplyDelete
  10. I am following each of your announcements with great interest.
    JIT optimization of array manipulations would enormously benefit my daily work.

    Even though I am trying hard to follow the discussion, I have difficulty understanding the issues at hand, and what numpypy is going to be when it is finished.

    Probably I am not the only one, considering the sometimes controversial discussion.

    My current understanding is this:
    All python code in numpy will run much better under pypy.

    The problem are the external libraries. Depending on the type, there will be different approaches.

    I assume that you will re-write a large part of the c-part of numpy directly in python, and then make use of the JIT optimizer. That would be the approach for all of the algorithms that are currently written in c, but could be easily re-implemented in python.
    Something like ufunc_object.c could probably be rewritten in python without a loss of speed.
    Of course, even though this would still run under normal python, it would be far to slow.

    Then you have external dlls, like BLAS. I assume you will call them differently (ctypes?), and not as extension modules. If you use ctypes, it will still run under normal python, maybe a bit slower.

    Then you have parts that are currently written in c, but that you can neither re-implement in python, nor call as a dll. Will you re-write those in c, using a different c-api? Or re-write them, so that they can be called using ctypes?


    Maybe you give a short general overview about the issues with the c-api and what you are doing?

    Something like. "Currently the function numpy.dot is written as a c-extension. It makes extensive use of PyArray_GETITEM. This limits the optimizer. We are therefore completely rewriting the function in python"

    What is the best approach for a user like me, who makes heavy use of numpy, but also scipy and my own extension modules, cython and f2py?

    Should I preferably write future modules as dlls, so that they can be called with ctypes (or cffi or something else), instead of making extension modules?

    Do you think it will be possible at all to use scipy, which makes much more use of non-python libraries, or do you think that scipy will have to be re-written?

    ReplyDelete
  11. Just a question - the donation figures on the homepage seem to be the same for the last 6 month or so. Is there really no donation or aren't they updated anymore.

    ReplyDelete

See also PyPy's IRC channel: #pypy at freenode.net, or the pypy-dev mailing list.
If the blog post is old, it is pointless to ask questions here about it---you're unlikely to get an answer.