Wednesday, March 11, 2009

"Blocks," or "lambdas in a non-sexp language"

Last week there was a lot of discussion around tav's proposal to add Ruby blocks to Python. Eventually, the proposal went to python-ideas, and got ground into the dust by all of the objections. The idea has been proposed before, and has always met with strong resistance. The argument against anonymous pseudo-expression functions relies on the idea that Python already has powerful syntax for doing 90% of the things that Ruby and Scheme use blocks and lambdas for, and Guido seems to prefer it that way. In the end, I think Guido's right about blocks in Python, but in another language with different scoping rules, blocks might be a great idea.

Consider the Ruby blocks examples on the c2 wiki. The first three examples, which are iteration, callback registration, and resource management, all have distinct syntaxes in Python, while in Ruby they all use blocks:

Iteration:
# Ruby:
collection.each do |element|
...
end

# Python:
for x in collection:
...

# Ruby:
numbers = [1,2,3,4]
squares = numbers.map {|n| n*n }

# Python:
numbers = [1, 2, 3, 4]
squares = [n * n for n in numbers]

Callback registration:
# Ruby:
button.on_click do |event|
...callback code...
end

# Python:
# (decorators are more general, but this is a common use case
# exemplified by Django filters and tags.)
@button.on_click
def raise_dialog(event):
...callback code...

Resource management:
# Ruby:
File.open(filename) do |file|
...read from file...
end

# Python:
with open(filename) as file:
...read from file...

The consensus on the python-ideas list was that the dedicated for-loop and context-manager syntax is more readable, because no matter what object you're iterating, you have a big fat keyword on the line start telling you how the next code block is going to be executed, instead of one syntax stretching to try and cover multiple unrelated use cases. The verdict could also be interpreted as another instance of the "there should be only one way to do it" philosophy of Python. Currently, def is the only way to create a function that can contain statements, and decorators cover many of the higher-order function use cases. Introducing another syntax for those tasks goes against the grain.

So if blocks aren't good for Python, where do they work?

First of all, I think blocks in Ruby are kind of broken. I've seen many people talk about the elegance of the Ruby block syntax, and I just don't buy it. Why all the puncuation and magic ampersand-arguments? What the hell is up with optional parentheses on function calls? That's friggin' crazy when you're working with function values. It's almost as bad as Common Lisp having separate namespaces for functions and values. The scoping rules are also crazy. Because there's no variable declarations, you can modify names in enclosing scopes by accident. Python deals with this via the new 'nonlocal' statement. Also, the whole DSL craze and the role of blocks in that is just kind of strange to me. So forget that stuff. What I like about Ruby blocks is that they are an innovative way to do non-neutered lambdas in statement-oriented languages without dangling parenthesis.

Blocks occupy this weird middle zone between functional programming and stateful languages, because in functional languages or Lisps statements are either not allowed or are parentheses-wrapped expressions that you can stick anywhere you want anyway. Blocks are especially relevant in whitespace sensitive languages like Python and Ruby, where jamming a statement into an expression is awkward grammatically. Reia is a good case study for what happens if you try to force statements into expressions. So blocks are a little innovation to move the statements out of the expression and into a following block of code. In Lisp, the trailing parenthesis would be no big deal, but in statement-oriented languages it really messes up your grammar.

So what's the point of this stupid syntax hack so you can write multi-statement lambdas in stateful langauges? I think the reason that new Ruby programmers are so much in awe of blocks is because they haven't been properly exposed to first class functions before. A lot of them are web developers, and aren't interested in those high falutin' ideas about functional programming. I think that the key to teaching someone functional programming is lambda. Without the ability to embed executable code into an expression and pass it off to another function, you're left gesticulating wildly about how functions are values like numbers, strings, and lists. Lambda can really demonstrate that, as they say in 6.001, "the value of a lambda is a procedure." Once you've internalized that idea, you're ready for higher-order functions and the rest.

So while I think that in the existing Python ecosystem it makes sense to not have blocks, it makes it harder to teach and use higher-order functions. There's something to be said for the Ruby way of doing all of those examples above. They all use the same mechanism, and that's another kind of "there should be one obvious way to do it" in action.

In conclusion, if you're designing a new non-functional or whitespace sensitive language and you like the power of lambdas, blocks are probably a good way to express them. Patching them into Python now, however, would probably take away from the simplicity of the language.

Sunday, March 8, 2009

Automatic __repr__ and __eq__ for Data Structure Classes

For the compilers class project that I'm working on, I recently wrote a couple of classes that do simple structural equality and automatic __repr__ generation. We have a lot of simple data structure IR classes in our project, and they all need __eq__ and __repr__ for testing and debugging. Structural equality is easy; all you have to do is introspect on __dict__ and see that the attributes match recursively. Automatic __repr__ is more difficult, though, because to produce valid Python source, you have to know the order of the arguments to the instance's __init__ method. Fortunately, with the inspect module, you can call inspect.getargspec(self.__init__) and get that information. We use the simple convention for our IR nodes that the arguments to __init__ all become attributes of the same name, so you can then use the argument names and getattr to generate the reprs of the subnodes. Good times!