Sunday, October 26, 2008

Why explicit self has to stay

Bruce Eckel has blogged about a proposal to remove 'self' from the formal parameter list of methods. I'm going to explain why this proposal can't fly.

Bruce's Proposal

Bruce understands that we still need a way to distinguish references to instance variables from references to other variables, so he proposes to make 'self' a keyword instead. Consider a typical class with one method, for example:

That's a saving of 6 characters per method. However, I don't believe Bruce proposes this so that he has to type less. I think he's more concerned about the time wasted by programmers (presumably coming from other languages) where the 'self' parameter doesn't need to be specified, and who occasionally forget it (even though they know better -- habit is a powerful force). It's true that omitting 'self' from the parameter list tends to lead to more obscure error messages than forgetting to type 'self.' in front of an instance variable or method reference. Perhaps even worse (as Bruce mentions) is the error message you get when the method is declared correctly but the call has the wrong number of arguments, like in this example given by Bruce:

I agree that this is confusing, but I would rather fix this error message without changing the language.

Why Bruce's Proposal Can't Work

Let me first bring up a few typical arguments that are brought in against Bruce's proposal.

There's a pretty good argument to make that requiring explicit 'self' in the parameter list reinforces the theoretical equivalency between these two ways of calling a method, given that 'foo' is an instance of 'C':

foo.meth(arg) == C.meth(foo, arg)

Another argument for keeping explicit 'self' in the parameter list is the ability to dynamically modify a class by poking a function into it, which creates a corresponding method. For example, we could create a class that is completely equivalent to 'C' above as follows:

Note that I renamed the 'self' parameter to 'myself' to emphasize that (syntactically) we're not defining a method here. Now instances of C have a method with one argument named 'meth' that works exactly as before. It even works for instances of C that were created before the method was poked into the class.

I suppose that Bruce doesn't particularly care about the former equivalency. I agree that it's more of theoretical importance. The only exception I can think of is the old idiom for calling a super method. However, this idiom is pretty error-prone (exactly due to the requirement to explicitly pass 'self'), and that's why in Python 3000 I'm recommending the use of 'super()' in all cases.

Bruce can probably think of a way to make the second equivalency work -- there are some use cases where this is really important. I don't know how much time Bruce spent thinking about how to implement his proposal, but I suppose he is thinking along the lines of automatically adding an extra formal parameter named 'self' to all methods defined directly inside a class (I have to add 'directly' so that functions nested inside methods are exempted from this automatism). This way the first equivalency can be made to hold still.

However, there's one situation that I don't think Bruce can fix without adding some kind of ESP to the compiler: decorators. This I believe is the ultimate downfall of Bruce's proposal.

When a method definition is decorated, we don't know whether to automatically give it a 'self' parameter or not: the decorator could turn the function into a static method (which has no 'self'), or a class method (which has a funny kind of self that refers to a class instead of an instance), or it could do something completely different (it's trivial to write a decorator that implements '@classmethod' or '@staticmethod' in pure Python). There's no way without knowing what the decorator does whether to endow the method being defined with an implicit 'self' argument or not.

I reject hacks like special-casing '@classmethod' and '@staticmethod'. I also don't think it would be a good idea to automagically decide whether something is supposed to be a class method, instance method, or static method from inspection of the body alone (as someone proposed in the comments on Bruce's proposal): this makes it harder to tell how it should be called from the 'def' heading alone.

In the comments I saw some pretty extreme proposals to save Bruce's proposal, but generally at the cost of making the rules harder to follow, or requiring deeper changes elsewhere to the language -- making it infinitely harder to accept the proposal as something we could do in Python 3.1. For 3.1, by the way, the rule will be once again that new features are only acceptable if they remain backwards compatible.

The one proposal that has something going for it (and which can trivially be made backwards compatible) is to simply accept

def self.foo(arg): ...

inside a class as syntactic sugar for

def foo(self, arg): ...

I see no reason with this proposal to make 'self' a reserved word or to require that the prefix name be exactly 'self'. It would be easy enough to allow this for class methods as well:

@classmethoddef cls.foo(arg): ...

Now, I'm not saying that I like this better than the status quo. But I like it a lot better than Bruce's proposal or the more extreme proposals brought up in the comments to his blog, and it has the great advantage that it is backward compatible, and can be evolved into a PEP with a reference implementation without too much effort. (I think Bruce would have found out the flaws in his own proposal if he had actually gone through the effort of writing a solid PEP for it or trying to implement it.)

I could go on more, but it's a nice sunny Sunday morning, and I have other plans... :-)

One thing that might be useful is to throw out a warning if a method is defined and the first parameter to it isn't called self. I know that "self" is only a convention, but *everyone* does it. The one or two people who decide to maliciously call their "self" parameter "this" can just block the warning, no? And warnings aren't fatal anyway. It'd annoy about 9 people and it'd save me about fifty times a day when I forget to put self at the beginning of a method's parameter list...

@stuart: Since people are free to call the argument whatever they like, Python would be enforcing a convention, and it has never done that. That's what tools like pylint are for. Pylint produces an error message 'Method should have "self" as first argument' for this case in its default configuration.

I think it would be detrimental to have a fixed 'self' for methods (or to encourage that name through warnings) because it hampers nesting of classes within other class definitions. I had trouble thinking of an example of where you'd want to do this that isn't evil and may have failed:

Personally I do not especially appreciate "self". My bigger complaint though is that it can be an arbitrary name, and at least one project (conary) uses "r" instead of "self". I assume because it is shorter to type. And this is my main complaint - why is it possible to use any name one wants to for it? Of course most will use self, but Python enforces a rather strict non-ambiguity "there should be one obvious and easy way" ruleset, and I believe if in this case it is open for a change, in other cases it could be considered to change as well. To me there is no real big conceptual difference between a parser interpreting something as an error, as opposed to a "convention" which we could change at our own discretion - but let's face it, in the case for self, I claim that about 95% of every python writer will call it "self".

(Note though - I dont really complain per se, it will be much more interesting once most people will be using "Python 3000" or whatever name it gets to have.)

The "def self.foo(arg):" syntax makes me remember of the method definition syntax of Prothon (a dead Python-like prototyped language). By the way, someone could ressurect this idea. A language with Python's syntax and Io (iolanguage.com) prototyped semantics would be interesting to see.

Personally I like self. At first when moving from java I didn't like it as it doesn't make any sense from a java perspective, but now that I've gotten use to it I like it.

I think that such stylistic measures once taken in a language should be stuck to as they define the language and the new features.

For example java has its style of explicitly defining things in order to prevent people from doing bad things. If you don't want that you shouldn't use java.

Likewise with python it to me is defined around a simple parser so I expect things to happen simply without any complex or especially clever logic. To remove self might be clever, but it would also be confusing as the simple philosophy becomes clouded with things that are not really necessary.

redditrasberry 2 points 5 hours ago[-]I guess it's an aesthetic thing and therefore hard to reason about (I completely agree that there's no gigantic practical import to it).I think this is a case where once you use the language enough you no longer notice the blemish, but if you are an occasional user (as I am) it stands out like a sore thumb.To make a bad analogy, it's like a stain on your carpet in your front hallway. If you live with it for long enough you won't even see it any more. But to visitors coming to your house it's the most obvious thing. And it's particularly noticeable because the rest of Python is so nice - it's like I'm visiting an art gallery and everything is beautiful and pristine, but there on the carpet at the entrance is this huge stain that nobody has ever cleaned up.

The explicit self is wonderful. Instead if wondering why you have to type this in Python, I've always wondered why you don't have to in other languages. It takes away the implicit "this" magic. Self makes perfect sense.

After having programmed in Python for quite a while, I actually miss the explicitness of self in other languages.

It really has strong "say what you mean" semantics -- taking a superficial glance at the code, you can at once see whether instance or non-instance variables are accessed. Compare this with using markup (prefixing _ to member vars) to relay the meaning -- you have to know the conventions to grasp the difference, i.e. there is an extra level in comprehension (aha, _foo denotes this.foo). It's not uncommon to see people use explicit this in Java and C++ as well for that particular reason.

I'd say most people who grok Python at the "idiomatic" level instead of using it as an easier way to write Java would be seriously upset if self is abandoned.

I like the explicit self, and don't think `self' should be mandated, e.g. `my' avoids some clutter. BTW, it seems the code that's colourising your Python is treating `self' as a reserved word! Did Bruce write it? :-)

Personally, I like explicit self. The general idea I get reading comments about it here and elsewhere is that people that are new to python generally don't like it and people that are used to python generally do.

I always loved the explicit nature of self yet I remember myself beign preferable to a "def self.foo(arg)" syntax in the beginning.

I think what disturbs people the most is the optical 'verbosity' it causes. You have your normal arguments in one place, you dont want this special argument together(eg counting your arguements, doest feel nice)

But now, I simply add two spaces after my self's comma, just to visibly unbound it a little from the other args, and I think its the best solution overall.

My apologies if this has already been stated, but chromakode on Reddit.com had some very good comments that I think nicely summarize how explicit self helps you out.

Here is the entire comment (minus a couple of sentences that pertained to a reply to an earlier comment)

"""Python's use of 'self' as an explicit argument is a slight syntactic trick that extremely cleverly glues together the bound/unbound programming experience. Having programmed in many OOP languages where 'this'/'self' are implicit, I have to say that I greatly prefer Python's way of doing it. It answers the following questions very elegantly:

How did self get defined locally in my method?

Explicit: you specified it as an argument, ether via instance.meth(args) or class.method(instance, args).

Implicit: I put it there for you automatically because you called a class instance method.

So how do I specify 'self' myself?

Explicit: you pass it as an argument.

Implicit: you use a language construct such as method.apply(instance, args).

How do I pass around bound methods (or more general: closures) for callbacks?

I think it makes sense to have an explicit self when grafting random methods onto classes since they presumably are not in the lexical scope of the class. When they are contained in a class block I don't see why you would need to define self.

It seems like self should always be the bound to the instance of the surrounding class the method is defined in. So a global def has no implicit self but a method does.

To keep things from getting lame like javascript closures would include self. The only problem is nested classes? How do you refer to an outer self?

I'd rather not have 'def self.meth(args)' because it would close off the possibility of generalizing the def statement to allow an arbitrary lvalue in place of the name. This would be useful for things like setting up a dictionary of functions:

Coming from the PHP "language", we have far worse problems than the explicit self required in Python.

Compared to Groovy, and especially Ruby, the so-called elegant language, Python is the absolute natural choice for a sick-of-PHP developer.

Nonetheless, this one small issue will drive me nuts, an entire application overflowing with an apparently optional self method param -- uggghhh, if only, please, we the numberless want-to-convert-to-python beings beg of you, get rid of the explicit self method param or I'll...learn Ruby, even @ and @@ is better than explicit self.

Seriously, everyone is still using Python 2, just sneak out the explicit self in 3.2 stable and nobody will notice.

Alright, alright, money, money talks, how much, in Euros, will it take to get explicit self removed??

Coming from using Java for the past year or two, I've found that the only reason I would have needed to use self-references is when I wrote redundant variable names. I've literally never required the use of such a thing in Java because I don't reuse variable names in the same scope.

The larger issue I have with people getting snooty about stuff like this though, is that we're talking about an Interpreted language which abstracts most of the low-level stuff away from the programmer anyway. It seems to me that being anal-retentive about a self-reference is a splitting hairs somewhat. Also, Python doesn't support overloading, so let's step back for a moment:

A: Python doesn't support overloading, so you can't pull a Java and call self(args) from an overloaded constructor.

B: Using self allows you to re-use variable names, which -- and this is my opinion of course -- I think is a bad practice to begin with.

I like python, but I personally chalk up explicitly passed "self" references on the "weird idiosyncratic stuff programmers think is a good idea" board where Python is concerned -- right next to the "why the hell did Sun think it was a good idea to make everyone write System.out.print() for every console print statement" entry.

I don't like explicit self. I've never seen a case where it was to my advantage to specify "self", but on the contrary have had several instances where it was a pain when I forgot it.

I also don't buy "explicit is better than implicit". There are plenty of things the language does implicitly that we don't complain about and would be horrified if we had to be explicit. Some cases in point:

You can declare a local variable right now without a keyword. You don't have to say local foo = "bar", the language infers that you want a local variable from scope.

I love the cleanliness of Python. No braces, no semicolons and very few parenthesis. No need to explicitly specify a variable's type or the return value of a function. It is beautiful. Like a calm pond reflecting the sunset. Suddenly, into this pond is tossed a rock called 'self'. The beautiful sunset is spoiled by ripple after ripple, slowly expanding to disrupt every corner of the once tranquil pond.

My 11 year old son has been learning Python as his first language. Tonight he asked me, "Daddy, why is all my code 'self', 'self', 'self'"? I decided that was a good question, so I am making a post. Here is a method from one of his classes. Keep in mind that this code is written by an 11 year old.

Nearly every line contains the word 'self'. Even worse, he keeps muttering curses under his breath as he alternately forgets to use 'self' and then accidentally uses it where he didn't mean to, wasting countless hours. My son is drowning in 'self's. It appears that I will have to go searching for another first language. So sad because Python is so good except for this incredibly painful wart.

From my admittedly limited experience with Python, I believe the most pythonic design would have been to make 'self' a keyword that is only required when the scope requires clarification, just like 'global'. At the top of a method, one would simply state 'self x', and for the rest of that method, all references to 'x' refer to a field on the class. No need for an explicit 'self' method parameter and no need to sprinkle 'self.' throughout the code.

I am not addressing non-variables such as decorators, because they are not nearly as important as fields. However, I am sure a suitable solution could have been found -- even just requiring the 'self.' prefix on each decorator would be fine because their use is relatively rare.

Because I don't like making a criticism without also making a constructive suggestion, here is something that can be done to mitigate the egregious method pollution by "self." without breaking backward compatibility:

1. 'self' is not a keyword (no change)2. methods still require an explicit 'self' parameter whose name can be chosen by the programmer (no change)3. the explicit first parameter name chosen by the programmer can be used to scope variables as follows:

class Foo: def do_something(self, param) self x # declares that 'x' references are scoped to 'self' (similar to 'global') x = 5 # same as self.x = 5 def do_something_else(this, param) this x # declares that 'x' is a member of 'this' (note that any name may be chosen) x = 5 # same as this.x = 5 def do_yet_another_thing(its, param) its.x = 5 # still works just like it used to

I submit this proposal for consideration in a future version of Python 3.x. It is a small change to the language that provides a huge improvement in readability and is fully backward compatible with existing Python code.