Interestingly, it IS backwards compatible in areas that you wouldn't think it should be. For instance, the following program takes the version number, adds one to it, and divides by two. You'd think it'd give a different answer between version 3 and version 2. Glad they kept this program working for me, as it's the secret production code that runs my multi-million dollar business.
import sys
version=int(sys.version[0])
print (version+1)/2

Last time I checked (several months ago) it was not thought that backward compatibility would be broken very hard. Most of the modification to do should be automatic so I think that a lot of packets that are still maintained will quickly be made compatible for python 3

It is over-rated in the sense that the number of current users which are inconvenienced is a very small percentage of the total number of users of the language (unless the language is in the tail end of its life, like Fortran and Cobol).

It is misunderstood in that with the use of a simple header or import declaration it is possible to have two different versions co-exist while the transition happens. This is done in HTTP where the first thing that clients exchange is the version of the protocol they'll use. It is also done in LaTeX, where the first declaration informs the compiler which major version is being used (pre-2e or 2e).

Kudos for Python for not being afraid to rock the backwards compatibility boat.

Fortran will continue to thrive for many years. I don't know numbers, but based on my personal experience, it's the preferred language of most computational scientists and engineers. The most recent revision occured in 2003. According to this [acm.org], a new one is being worked on.

I agree. The point is that the number of current users is a non-negligible percentage of the universe of future users. It is in that sense that it is "near the tail end".

For languages which are very early in their life cycle, such as Python, the number of users inconvenienced today are negligible compared to the total number of users that it will have and benefit from the changes.

A few years ago, when I was first getting into Python, I read an article where a guy from a science research lab talked about his lab's transition from Fortran to Python. Python has some nifty heavy-duty math modules, written in C; and everyone at the lab who tried out the Python stuff strongly preferred it to Fortran.

Since C code is doing all the heavy lifting, it's nice and fast. Since Python is interactive, scientists can use it as a really-power

Python can't replace Fortran, but C can (and to a large extent, is). For most serious scientific computation, the initial software is written in a language like MATLAB or Python, which make use of number crunching libraries written in C or Fortran. When that code needs to be modified to run on a supercomputer instead of a workstation, it is usually converted to pure C or Fortran.

Interpreted and interactive languages like MATLAB and Python make it easy to prototype and test a new algorithm, but C and Fortran are still necessary to make an efficient implementation.

(Disclosure: I am a mathematician, currently using all the above languages for ongoing research, though I am studiously avoiding having to write any Fortran myself.)

Already has been, in my world. I know plenty of people around the chem department who still use Fortran because 'it is the language of scientific computing, dammit!'

Here is the thing. Most of the time, they were so panicked about how long the program would take to run, they lost sight of how long it took them to write it.

I replaced many Fortran programs with Python in my time, because I could write the data IO so much faster, and then just use the C-level numerical libraries to do the analysis. The program would end up running just as fast, and the code could be written in an hour instead of a week.

Some people will die before they change languages. The rest of us just want our results. Hopefully, the switch to py3k goes easy and the community continues to grow.

Agreed. It is very unfortunate GCC in Mingw is still using such old utilities. It generally works for all the code I write but I would like to have 4.x on Mingw (it is possible to have but it does not work well).

Python 3 being out is great, they've fixed a few things that allow bad programming

Really? So if I write code in Python 3, it's guaranteed to be "good" programming?

Honestly, I didn't look at the article... have they actually made things MORE rigid?

I use python... I like python... but I can't help but think it was designed by someone who was pissed off that people didn't format their code the way he formatted his code. Since his way was obviously the "right" way, why not write a language that forces you to

I can't help but think it was designed by someone who was pissed off that people didn't format their code the way he formatted his code. Since his way was obviously the "right" way, why not write a language that forces you to do it that way? Problem solved!

This is actually the main reason I haven't worked with Python beyond tweaking a few existing scripts. The funny thing is that (unless I'm misremembering the syntax) I already code using that style in other languages. But the idea of forcing that style on

I can't help but think it was designed by someone who was pissed off that people didn't format their code the way he formatted his code. Since his way was obviously the "right" way, why not write a language that forces you to do it that way? Problem solved!

This is actually the main reason I haven't worked with Python beyond tweaking a few existing scripts. The funny thing is that (unless I'm misremembering the syntax) I already code using that style in other languages. But the idea of forcing that style on everyone annoys me enough to put me off of the language as a whole.

I was really hoping that 3.0 would remove that petty stupidity. Doing so would even retain backwards compatibility with prior versions!

I just don't get it when people say that, its sorta like saying you don't use language X because you have to store numbers as floats or integers instead of char variables.

I honestly like the fact that Python forces a coding format, I hate opening someone else's source and spending the first minutes trying to understand how they layout things if at all. And yes if people were smart it would be easy to pickup anyones code, sadly that world doesn't exist.

No its not petty stupidity, not using Python because of your reasons is sadly what I would call petty stupidity.

But the idea of forcing that style on everyone annoys me enough to put me off of the language as a whole.

I had that exact reaction when I first came across Python. But after giving it a chance (many years later), I realized that it doesn't force a style any more than C forces the "style" of putting braces around blocks. Indentation levels are just syntax elements that happen to correspond to what most developers naturally do. Really, having to indicate blocks to the compiler in one way and to humans in another way is a DRY violation, which Python eliminates.

You don't use tabs in the first place. And in any case Python enforces no standard of block indent, it simply requires that you use the same indent for all blocks. So you can tab+space all you want so long as all of it is the same. The human reader merely requires that you use a unicode font and everything lines up. What exactly is hard about that? The reason to use braces is to speak to the computer, humans still indent to make it readable.

The cool thing about Python is it's "time machine". In Python 2.x you can "from __future__ import " to use features scheduled for future releases. With the release of Python 2.6 there's also a "2to3" tool that will point out revisions needed for 2.x code to be 3.0-compatible, and generate patches for you.

The Python developers have been aware of the difficult road of migration long before the release of Python 3, and they did a lot of careful planning and hard work for it. One of them being the __future__ module that has been there for quite long time just for this reason.

As a Python user, my hat off for them. I wish them success heartily.

BTW: In case you don't know, there's an Easter egg in the time machine: "from __future__ import braces";)

I like the sprintf style % part. But I don't like the weird rules- e.g. "A space is written before each object is (converted and) written, unless the output system believes it is positioned at the beginning of a line."

And now they change the syntax of print a lot.

Couldn't they just call it something else and keep the old weird print the way it is and thus not break so much?

But sometimes the changes are so big they can't be encompassed by a compiler switch. Such it is with 3.0.

While I agree with your post, here it's not a problem with implementation but with syntax and backward compatibility within a given python version.The idea is that some needed changes cannot be made backward-compatible (new keywords,...). So you group them and call that a new version of the language. I doubt you couldn't implement most of it with compiler switches.

If the syntax differences and the differences in the standard library are well-documented, shouldn't it be possible to write a program that migrates 2.x code to 3.x code automatically? Does such a program exist?

Reworked Unicode support is a big deal. It was there before, of course (unlike Ruby - meh), but all those Unicode strings vs 8-bit strings, and the associated comparison issues, complicated things overmuch. Not to mention the ugly u"" syntax for Unicode string literals which was too eerily like C++ in that respect. Good to see it move to doing things the Right Way by clearly separating strings and byte arrays, and standardizing on Unicode for the former.

Now, if only we could convince Matz that his idea for Unicode support in Ruby 2.0 - where every string is a sequence of bytes with an associated encoding, so every string in the program can have its own encoding (and two arbitrary objects of type "string" may not even be comparable as a result) - is a recipe for disaster, and take hint from Python 3...

since methods exist to examine what the encoding of a string is, and to change it, how would there be a disaster unless the coder was sloppy?

Assume a simple case: a function taking two strings as arguments. In Ruby 2.0, you cannot safely concatenate those two strings, or even compare them (because encodings may be incompatible). You cannot properly interpret it, because the set of possible encodings is not closed (the client may pass you a string with an encoding he defined himself). You cannot convert it t

If I understand Unicode correctly, the entire point is that Unicode provides a code point space, which defines all the possible characters available.

You understand almost correctly:) The problem here is, what is a "possible character"? It is in many ways a political issue, and apparently some people aren't happy about the way Unicode handled some characters. One particular sore point is that of Han unification [wikipedia.org] - basically, Unicode assigned a single codepoint for every Han glyph, whether it's used in Chine

The statement is an error as the types don't match. Quite a few people claimed this in response to my previous posts.

They are correct. "UTF-8 String" is not really an UTF-8 constant, it's just a plain Unicode string now. It makes sense, too, as comparing a byte array with a string is not generally well-defined operation. And yes, of course, it's a breaking change, and is on the changelog [python.org].

Now you can still have byte array literals if you want them, but they are opt-in via "b" prefix (much like Unicode strin

Reading the changelog, it sure does sound like b"abc"=="abc" will produce an error. I do find this extremely suprising as I would think this would break enormous amounts of software.

It sounds like Python 3.0 will throw an error if you read a file that contains invalid UTF-8, until the program is rewritten to read the file as "bytes". Then it will throw errors when you convert the bytes to "str", until you rewrite the functions reading the files to return bytes instead of str. Then the users will hit this pr

It sounds like Python 3.0 will throw an error if you read a file that contains invalid UTF-8, until the program is rewritten to read the file as "bytes". Then it will throw errors when you convert the bytes to "str", until you rewrite the functions reading the files to return bytes instead of str. Then the users will hit this problem in that their code will no longer compile. I can't see this being any good.

Why isn't it? If your input file is supposed to be UTF-8 text, and is not, then surely it's an error? As you say yourself, you can always load it as raw bytes if you want to work with it nonetheless. But, of course, as soon as you want to start treating it as an actual string - so that you can say things such as "give me the 10th character" (and not "10th byte") - it has to be valid, otherwise all string-specific operations would simply be undefined.

You are parroting the same crap used by people who don't like UTF-8 and try to make it more difficult than it really is. It is indeed UTF-8, just because it has errors in it does not make it not be UTF-8, anymore than a misspelled word makes this post not be English.

I like UTF-8, but UTF-8 with errors in it is clearly not valid UTF-8, no more than XML with a missing closing tag in the middle of the file is valid XML. The problem with such UTF-8, as I've mentioned earlier, is that no string processing function would know what to do with it. If you, say, try to convert it to uppercase, what should it do with invalid sequences? What about the earlier example of indexing by characters, or taking the leftmost or rightmost N characters - how should it could the unterminated sequence?

You seem to have forgotten languages called "C" and "C++". I heard they were pretty popular...

No, I did not. C and C++ actually work in precisely the way Python 3 does. The only difference is that in them, a plain unadorned string literal is a "byte array", and you have to explicitly request wide chars (to simplify, let's assume it means always means "Unicode" for now) by prefixing it with "L". Otherwise, it's precisely the same. In particular, L"\xC2\xA2" is not a cent sign in either C or C++. It's a wide (string with two characters. Plain "\xC2\xA2" is a non-Unicode (i.e. byte) string of two bytes, which produces a cent sign when treated as UTF-8 - and so is byte string b"\xC2\xA2" in Python.

I think you might also check exactly what some of those languages do, you can't put more than \xff into most of them so they are actually doing exactly what I am saying

It's a legacy of C/C++ - they couldn't extend the "\x" escape sequence to use more than 2 digits without breaking existing string literals, so they left it as is. In C/C++, Java, C# etc, if you want a full-length Unicode escape, you use "\u1234". However, note that it doesn't really change anything - inside a Unicode string literal, in all these languages, "\xFF" is the same as "\u00FF", which is the same as "\U000000FF". None of them allows to define individual bytes in Unicode string literals.

What you are saying is that there is no difference between \x and \u, which seems pretty stupid to me.

Yet that's how it is. Do you want quotes from the respective language specifications?

The compiler is already assuming UTF-8 when it parses u"abÂ" so I see no reason it can't assume UTF-8 here as well.

This decision is made on different levels. The compiler isn't assuming UTF-8, the code which reads the file as a sequence of characters (before lexing, much less parsing, takes place) does that. On the other hand, processing the content of the string literal is (most likely) done by the lexer, including character escapes. Also, keep in mind that non-UTF input files are still legal - should escape sequences in literals suddenly change meaning for them?

If your input file is supposed to be UTF-8 text, and is not, then surely it's an error?

UTF-8 with errors is STILL UTF-8. It just is not "valid UTF-8" which is a mostly uninteresting subset. The set of UTF-8 strings is every single possible byte sequence. The set of "valid UTF-8" strings is a SUBSET that a tiny portion of software (mostly validators) should have to care about.

People are trying to make this far more difficult than it really is by somehow saying that we must restrict ourselves to that subset a

Unfortunately, they just abandoned some critical byte-string interfaces, which makes it impossible to write non-"toy" programs in Python 3.0. E.g. there's no way to get the original argv[], which is a pretty fundamental omission.

Given that you can always do encode() on the Unicode string to get its byte representation in default encoding of the current locale, what's the problem?

First thing mentioned on the 'what's new' page (http://docs.python.org/dev/3.0/whatsnew/3.0.html)is that you'll have to change your code from

print x, y, z,

to

print(x, y, z, end="")

I can see the value of making things more consistent, but it seems to me whenever they update things in Python, it's usually to make programming in it a little bit harder.

Why not make print a function, but then change the language to not require parentheses for any function call? You'd still have to use them when calling a function with zero arguments, and in sub-expressions, but to not require parens for top-level function calls would, if nothing else, make playing around in interactive mode or with short scripts a lot more pleasant.

Granted, I come from a Ruby background, so I may not know what I'm talking about. My experience with Python is trying to write some scripts on my OLPC, where the craptacular rubber keyboard made typing parentheses all the more agonizing. I finally caved and installed Ruby so I could get some work done. Maybe people who prefer Python really like typing parens. And underscores.

I would say that it makes typing python a little bit harder, but I would also argue that it makes programming python easier, not harder (it eliminates print as a statement, but it also eliminates special syntax that existed only for redirecting print output, and makes it trivial to change the default behavior of print within a module (by defining a local print function)).

AFAIK Perl 6.0 is already there, in the form of Pugs (which is said to be compatible with all the specs), and it's just the implementation of Perl 6 in perl6 itself what people are waiting for. You can go and write actual Perl 6 code, and run it on Pugs, and it'll work.

* A black man was elected President of the US - November 4, 2008* Chinese Democracy was released - November 23, 2008* Python 3000 is released - December 4, 2008* ?* ?* Large Hadron Collider starts operations - ?* Duke Nukem Forever is released - ?

I'm fairly certain they got all these non-backward compatibility issues out of the way with this release so they don't have to do this kind of thing again for a long while. My guess is, they wont ever put out a non-backwards compatible release, since those changes were mostly to fix poor coding practices like being able to run certain functions without braces (e.g print "hi").

So what are you going to do, take all your existing Python applications and rewrite them in a different language, in order to avoid the "significant amount of work to maintain existing functionality with new language version"?

Besides teh above remark of well thoguth migration paths - it is importante to remakr that support for python 2.x has not ended in any way.

As far as Iam aware, the recomendation is to keep working with python 2.6 - and use the py2to3 script to regularly to make 3.0 releases if you you can (i.e. if your dependencies have 3.0 versions already).

No need to worry about anything, this will be a smooth, years long, transition. Chances are we will even see a python 2.7 before 2.x is officially deprecated.

As for Ruby, I don't really follow its development or use it, but I was reading just the other day that they're really focused on finishing 1.9, which does byte-compiling and some optimization. The current version (like JS before spidermonkey, V8, and squirrelfish) walks and executes the AST (as I understand it), which is slooow.

Same here, and that was almost 2 years ago now.Any coder worth his salt is already indenting his code the way python likes it anyways, no matter which language he's using, so this part of the transition is normally a non-issue.

Well, the big issue I've run into with Python is when you are editing across multiple text editors, where some might use tabs, and some might use spaces. This seems to trip up Python where it wouldn't mess with a brace delimited language or something with an "end" syntax like Ruby.

As long as you are using your own code, and stick to your own conventions, then it's not a problem.

But what about when you are working with code from somebody else? You can not just look at it and tell if the original developer used spaces, or tabs. You have to do a hex dump, or something - what a pain.

And what if you want to cut and paste from a website? Or email code? Or post code to a news group, or whatever? Whitespace can be an issue in any

I dropped Perl for Python as well, and I never had to "get over" the indentation thing. Never understood why the big gripe. Programmers type braces and semicolons all the time without giving it a thought, someone elsewhere in this story asked why not an End statement in Python... yet somehow indenting code in a standard, readable way is noxious to them.

It's good thing when you get used to it as it makes source code much clearer. If you find that the forced indentation is bulking up your code too much then you are probably missing a trick... in Python there is always a short-cut and you just have to think more Python-like. For example in C/PHP I would type:x=1; y=2; z=3;When you first look at Python you are tempted to write:x=1y=2z=3Quickly you find you can:x,y,z = 1,2,3

It's not extremely verbose; take a look at Java if you want that. If you compare with e.g. perl, yes it's longer, but the difference is because it's using words rather than random characters, which in my book is worth it for

If you're the one doing the refactoring, then you'll know how far the indentation is wrong, and you can apply the correction.

I *shouldn't have to*. Besides which, the fact that I do introduces a major source of potential error: because indentation is semantically significant in Python, if I screw up during the refactoring process (particularly large scale refactorings), I can actually introduce bugs simply by not getting the indentation right. That's just unacceptable.

It seems some headers are not installed (BerkeleyDB? Dunno what's the Ubuntu package name for that. It's "db4-devel" here on Fedora). Just check them out and rebuild?

Anyway, I never expect some 3rd party source tarball to be able to "build right out of the box" for me. If you do something outside a distro's package management system, you'll have to manage the dependencies all by yourself.