Anyway, I'm going to close again. We thought of special casing "real" leading zeros (those at the very beginning of the string, not those within the string when the string before is the same for both strings). But we decided on following the original implementation for now. It gives consistent results. It gives consistent behavior with other software using the algorithm.

And I think sorting
001
002
01
02
isn't a bug afterall. There's a good chance that this is intention. Also, imagine this are decimal numbers.
1.001
1.002
1.01
1.02
It would be even more correct for this case.
Or this case:
007_tomorrow-never-dies
24_S02E11
This sorting makes more sense to me too.

Due to the latest mailing list discussion, I've reopened this task.
As of now, the general consensus seems that this task is a bug, and should be fixed.

Thus, I'm attaching a patch which resolves this issue.

Before looping through the chars, strnatcmp will go through leading zeros, increasing the index. After that is finished, the index is decreased by 1 if there the current char is a non-digit, so that the leading zero counts for non-digits.

Here's something I've been working on. It's based on the pre-revert version. It only deals with leading zeros. It does nothing with spaces.

I had to introduce a bias variable, since there's some kind of inconsistency on how the nat comp and strcmp handle the number of leading zeroes. For one file it favored more zeroes and in another it would favor less zeroes.

Also there's ifdef alternative code for a version that removes leading zeroes for number and letters. so [00, 1, 0b, 002] sorts as [00, 1, 002, 0b] instead of [00, 0b, 1, 002]

Sorry, I wasn't try to remove compare_left again or replace existing code. I was only trying to illustrate a different take. Basically, a run of numbers is compared as a integer, everything else is a separator. I used pre-revert code for this since it's closer to that assumption and easier to illustrate. I didn't need two functions since all number compares are treated the same.

For my code 1.001 is treated as 1[seperator]1 not One thousand and One or 1 point 1/1000th. Or whatever.

Personally, I think this is the easiest to implement and easiest to explain to non techie users.

As for the strcmp fallback, Yeah I know that's what causing it. I pointed that out in a previous comment. But, you just know somebody is going to file that as a bug, so I thought I'd give you a heads up. :D

So, your patch breaks decimal numbers (or any number after a non-digit separator).

The inconsitency you noted is only when the series of digits is the same. They sorted correctly if the series is different. (different by the means of: 001.txt and 01.txt are different, 000.txt and 00.txt are not).
I rather not break decimal numbers. And your inconsistency is at least consistently appearing, so my wishes to fix that are limited (and your version breaks more than it fixes).

I'm not sure what you're trying to get at. They sort differently as to be expected. Different assumptions, different code.

Mine
1.01 -> 1[sep]1
1.002 -> 1[sep]2

Yours and current
1.002 -> 1 and 2/1000
1.01 -> 1 and 10/1000

If your saying 1.002 should be sorted first because it less than 1.01 because it can be interpreted as a real number. Mine is wrong, current is right.

If those are version numbers both are right and both are wrong. .01 can be interpreted as a 1 or a 10. Depending on who you ask.

If your using another countries rules they could be interpreted as 1.01 and 1002. In which case mine is wrong, but sorts right for the wrong reasons and current code is wrong.

I apologize if I'm misunderstanding you. This is obviously an easy subject for that to happen. Just look at the irc and dev list. But trying to argue what 1.002 vs 1.01 'means' just leads down a path where nobody can agree and everybody is frustrated.

I'm not suggesting my patch be included in RB. I was trying to implement the minimal version of 'sort by number' and figured somebody would be interested in looking at it. Not deal with spaces, not guess what somebody meant by add leading zeros, underscore, or whatever. Just if it's 0-9 treat it as a number.

I thought we didn't want to interpret real numbers because their interpretation depends on what char is used as a decimal point, which is country dependent. Hence I think that the logic to interpret "1.002" as "1 sep 2" is to be used.

It's not about interpreting floats. It's about treating any number within the string (i.e. with non-digits before) as fractional (this of course includes floats also).

Anyway, we agreed on mimic'ing major browsers, and I just double checked windows explorer and nautilus. And they do NOT interpret floats, i.e. they sort: 1.1 < 1.03 < 1.4 < 1.007. So, I guess I'm ok with it.

I might have missed something, but the discussion on the mailing list mostly was about treating leading zeros. If you consider any string a separator, how would files like "2_episode5", "2_episode_10" get sorted? I'm completely with Daniel here.

It would still sort 2_episode5 < 2_episode10. There's no float involved. strnatcmp identifies floats/fractional by a leading zero (leading for this particular series of digits, not for the whole string).