That doesn't work on real data. You are assuming there's nothing else in the string.
–
tchristNov 2 '10 at 13:58

5

The date is a field I extract from a CSV record. There is nothing else in the field. It works perfectly for my requirements.
–
FunLovinCoderNov 2 '10 at 14:16

13

@tchrist: question is quite unambiguous: I need to convert from dd/mm/yyyy to yyyy-mm-dd format.
–
SilentGhostNov 2 '10 at 14:53

6

I am with silentghost on this one...don't solve problems that do not exist. Why assume the data was not validated in it's original form?
–
frankcNov 2 '10 at 16:35

4

@FrankC, I solved the problem as it was posed. It wasn't really completely specified, so I gave multiple answers, showing which ones worked where, and which ones failed where. I try hard not to assume things that aren't in the original question, especially if it might get used as cargo-cult programming. All the split() solutions, including mine, break on dates in text files; they work only on isolated instances, which is NOT what the question asked about. It may have been what he wanted, but he didn’t ask for that. Hence the multiple answers with differing provisos and approaches.
–
tchristNov 2 '10 at 17:50

Now letting $date be a foreach iterator through that array, we get this output:

Original is: 17/01/2010
First method: 2010-01-17
Second method: 2010-01-17
Third method: 2010-01-17
Fourth method: 2010-01-17
Original is: this 17/01/2010 and that 17/01/2010 there
First method: 2010 there-01-2010 and that 17-01-this 17
Second method: this 2010-01-17 and that 2010-01-17 there
Third method: this 2010-01-17 and that 2010-01-17 there
Fourth method: this 2010-01-17 and that 2010-01-17 there
Original is: from “17/01/2010” through 17/01/2010
First method: 2010-01-2010” through 17-01-from “17
Second method: from “2010-01-17” through 2010-01-17
Third method: from “2010-01-17” through 2010-01-17
Fourth method: from “2010-01-17” through 2010-01-17
Original is: 𐅹17/01/2010–𑂽17/01/2010
First method: 2010-01-2010–𑂽17-01-𐅹17
Second method: 𐅹2010-01-17–𑂽2010-01-17
Third method: 𐅹2010-01-17–𑂽2010-01-17
Fourth method: 𐅹2010-01-17–𑂽2010-01-17

Now let’s suppose that you actually do want to match non-ASCII digits. For example:

I think you’ll find that Python has a pretty brain‐damaged Unicode model whose lack of support for abstract characters and strings irrespective of content makes it ridiculously difficult to write things like this.

It’s also tough to write legible regular expressions in Python where you decouple the declaration of the subexpressions from their execution, since (?(DEFINE)...) blocks are not supported there. Heck, Python doesn’t even support Unicode properties. It’s just not suitable for Unicode regex work because of this.

But hey, if you think that’s bad in Python compared to Perl (and it certainly is), just try any other language. I haven’t found one that isn’t still worse for this sort of work.

As you see, you run into real problems when you ask for regex solutions from multiple languages. First of all, the solutions are difficult to compare because of the different regex flavors. But also because no other language can compare with Perl for power, expressivity, and maintainability in its regular expressions. This may become even more obvious once arbitrary Unicode enters the picture.

So if you just wanted Python, you should have asked for only that. Otherwise it’s a terribly unfair contest that Python will nearly always lose; it’s just too messy to get things like this correct in Python, let alone both correct and clean. That’s asking more of it than it can produce.

@Zaid,@Frost: You’re very welcome. Unicode regexes are very much in my head these days, and I’m trying to teach people that you really can write clean regexes that are simultaneously portable, legible, and maintainable. I’m trying to put down the "regexes are inscrutable" myth. Of course if you can’t use comments, whitespace, alphabetic identifiers, or decouple your declarations from their execution, it is completely hopeless. So don’t do that: use all those techniques in regexes, just as in any other programming language.
–
tchristNov 2 '10 at 17:46

I'm always worried about these sorts of solutions because they only convert the data that are valid dates and won't handle the dirty data. It would be nice if all data were clean, but I find they often aren't.
–
brian d foyJul 6 '11 at 19:16

What invalid dates are you thinking of? Most would cause strptime to throw an error.
–
MkVJul 8 '11 at 8:58

Yes, so what do you put in place when you get an error?
–
brian d foyJul 17 '11 at 3:45

That only works for super simple input strings, not arbitrary text files.
–
tchristNov 2 '10 at 14:57

2

@who says it has to work for arbitrary text? Don't solve problems that don't exist. The data might have been validated in the existing format. We don't know one way or the other so why make any assumptions?
–
frankcNov 2 '10 at 16:33

In fact the author has stated elsewhere that arbitrary text is not required as these are well known text strings from a CSV file. tchrist made this same comment on ther code elsewhere with just as little justification.
–
SorpigalNov 2 '10 at 18:38

1

The problem spec just mentioned that such date string occurred in text files. It at no point stated that the strings to be changed were not really occurring in text files; it fact, he said they were. Therefore solutions that pretend the strings to exist in complete isolation get all kinds of things wrong. You cannot ding people for answering the question that was asked. If the questioner had been more careful in his spec, people wouldn't have had to guess.
–
tchristNov 2 '10 at 18:47

I cannot fault you for your answer, which was extremely precise and correct, but this doesn't make any of the less-precise answers incorrect. You claim that the off-hand reference to "in text files" means "in arbitrary streams of text requiring the dates to first be parsed out and then converted" whereas I, and others, interpreted this question correctly as being about converting a string representation of a date once that string is obtained and not about parsing it out of a text file. You cannot ding people for answering the question that was asked and not the broader question it implied.
–
SorpigalNov 3 '10 at 10:27