Saturday, March 26, 2011

Unladen Swallow Retrospective

I wrote this while I was at PyCon, but I kept revising it. Anyway, here goes. :)
As is apparent by now, no one has really been doing any work directly on Unladen Swallow or in porting it to py3k. Why not?

Lack of Sponsor Interest

The primary reason is that we weren't able to generate enough internal customers at Google. There are a few reasons for that:
  1. Most Python code at Google isn't performance critical. It's used mainly for tools and prototyping, and most user-facing applications are written in Java and C++.
  2. For those internal customers who were interested, deployment was too difficult: being a replacement was not enough. Any new Python implementation had to be turn-key. We thought building on CPython instead of starting fresh would sidestep this problem because C extensions and SWIG code would just work. However, simply changing from the previous version of CPython to a 2.6-based Python was too difficult.
  3. Our potential customers eventually found other ways of solving their performance problems that they felt more comfortable deploying.
After my internship was over, I tried to make Unladen the focus of my Master's thesis at MIT, but my advisor felt that the gains so far were insufficient to have big impact and the techniques I wanted to implement are all no longer considered novel. Most feedback techniques were implemented for Smalltalk by Urs Hölzle and tracing for Java by Andreas Gal. Not to say there aren't novel techniques to be discovered, but I didn't have any ideas at the time.

Lack of Personal Interest

Most of this was decided around Q1 of 2010. We still could have chosen to pursue it in our spare time, but at that point things looked a little different.
First of all, it's a lot less fun to work on a project by yourself than with other people, especially if it's unclear if you'll even have users.
Secondly, a large part of the motiviation for the project was that we felt like PyPy would never try to support CPython C extension modules or SWIG wrapped code. We were very surprised to see PyPy take steps in that direction. That somewhat obviated the need to build a bolt-on JIT for CPython. Also, when the project was launched, PyPy didn't have x86_64 support, but in the meantime they have added it.
Finally, the signals we were getting from python-dev were not good. There was an assumption that if Unladen Swallow were landed in py3k, Google would be there to maintain it, which was no longer the case. If the merge were to have gone through, it is likely that it would have been disabled by default and ripped out a year later after bitrot. Only a few developers seemed excited about the new JIT. We never finished the merge, but our hope was that if we had, we could entice CPython developers do hack on the JIT.
So, with all that said for why none of us are working on Unladen anymore, what have we learned?

Lessons About LLVM

First of all, we explored a lot of pros and cons of using LLVM for the JIT code generator. The initial choice to use LLVM was made because at the time none of us had significant experience with x86 assmebly, and we really wanted to support x86 and x86_64 and potentially ARM down the road. There were also some investigations of beefing up psyco, which I beleive were frusturated by the need for a good understanding of x86.
Unfortunately, LLVM in its current state is really designed as a static compiler optimizer and back end. LLVM code generation and optimization is good but expensive. The optimizations are all designed to work on IR generated by static C-like languages. Most of the important optimizations for optimizing Python require high-level knowledge of how the program executed on previous iterations, and LLVM didn't help us do that.
An example of needing to apply high-level knowledge to code generation is optimizing the Python stack access. LLVM will not fold loads from the Python stack across calls to external functions (ie the CPython runtime, so all the time). We eventually wrote an alias analysis to solve this problem, but it's an example of what you have to do if you don't roll your own code generator.
LLVM also comes with other constraints. For example, LLVM doesn't really support back-patching, which PyPy uses for fixing up their guard side exits. It's a fairly large dependency with high memory usage, but I would argue that based on the work Steven Noonan did for his GSOC that it could be reduced, especially considering that PyPy's memory usage had been higher.
I also spent the summer adding an interface between LLVM's JIT and gdb. This wasn't necessary, but it was a nice tool. I'm not sure what the state of debugging PyPy is, but we may be able to take some of the lessons from that experience and apply it to PyPy.

Take Aways

Personally, before working on this project, I had taken a compiler class and OS class, but this experience really brought together a lot of systems programming skills for me. I'm now quite experienced using gdb, having hacked on it and run it under itself. I also know a lot more about x86, compiler optimization techniques, and JIT tricks, which I'm using extensively in my Master's thesis work.
I'm also proud of our macro-benchmark suite of real world Python applications which lives on and PyPy uses it for In all the performance work I've done before and after Unladen, I have to say that our macro benchmark suite was the most useful. Every performance change was easy to check with a before and after text snippet.
We also did a fair amount of good work contributing to LLVM, which other LLVM JIT projects, such as Parrot and Rubinius, can benefit from. For example, there used to be a 16 MB limitation on JITed code size, which I helped to fix. LLVM's JIT also has a gdb interface. Jeff also did a lot of work towards being able to inline C runtime functions into the JITed code, as well as fixing memory leaks and adding the TypeBuilder template for building C types in LLVM.
So, while I wish there were more resources and the project could live on, it was a great experience for me, and we did make some material contributions to LLVM and the benchmark suite that live on.