A list of bugs and issues I've found in Microsoft products, and programming frameworks, with solutions where possible, as well as random thoughts.

Wednesday, August 5, 2009

Path.Combine is essentially useless

One of the great things about .NET, coming from C++, is all the stuff that is built in. Need to send an email ? Sure thing. Want to use regular expressions ? Go for it. It took me a while to learn that things I expected to write in C++, I could look for in the library and often find there already.

The System.IO namespace has a Path class, which is used to manipulate file paths. Things like 'GetFileNameWithoutExtension' are very, very useful. Some things are a little counter intuitive, such as Path.GetDirectory walking up the directory tree if the string you have is a directory already, but overall, it saves a lot of work.

One of the things I use the most, is Path.Combine, which takes two fragments and merges them to make a path. In the past, I'd be checking if one string had a trailing slash, if the other had a leading slash, etc. Path.Combine takes care of that for you. Right ? Not quite.

There's a couple of quirks here. To illustrate, the following table has three columns. The first two are the arguments passed into Path.Combine, the third is the result.

c:\path\

dir\file.txt

c:\path\dir\file.txt

c:\path\

\dir\file.txt

\dir\file.txt

c:\path

dir\file.txt

c:\path\dir\file.txt

c:\path

\dir\file.txt

\dir\file.txt

c:

dir\file.txt

c:dir\file.txt

c:

\dir\file.txt

\dir\file.txt

The first thing to notice, is that if the second string starts with a \, then you get the second string back verbatim. This is the issue that hit me in the past. I assumed that this method existed so no matter what slashes happened to be in the two strings, they would get joined into a single path. As you can see, this is not so. Now, I assume there's a specific case for which this behaviour is desirable, but it's not the most obvious one to me, and if there's a reason for it, surely the method could have an overload, or better yet, a method called something in line with the reasoning for not combining these two strings could exist ( a method called Combine, is one I call to combine strings, not to SOMETIMES Combine them ).

The second one is more interesting. If my first string is a drive letter, with no slash in it, then no slash is added. I just did a test, I have a file called c:\procs.txt. File.Exists (@"c:procs.txt") returns false, File.Exists(@"c:\procs.txt") returns true. So, it seems to me that the slash is needed, but Path.Combine does not add it.

Overall, this method is basically broken as far as I am concerned, and I have rolled my own version to use instead. It basically makes sure the first string has a \ at the end, the second doesn't have one at the start, then calls Path.Combine, just out of spite ( given that at this point I could just concatenate the two strings and be done with it )

10 comments:

IMHO, this is a fundamental problem with the "framework" (vs. "library") mindset: if you aren't writing the app the designers build the framework for, you'll run into bizarre scenarios where what looks like a general-purpose manipulation routine in fact exists for a very specific purpose (one not aligned with your own...)

Near as i can tell, Path.Combine() is intended to produce a sane union of a user-specified path and an application-defined "base path". Think: typing a path into the "file name" entry field in the standard "File Open" dialog.

In this specific scenario, several assumptions are made with regard to the potential nature of both paths, and the desired outcome.

For the user path: - it could be a relative path (including a simple file name) - in this case, it should be combined with the base path. - it could be an absolute path - in this case, the base path should be ignored. - it could be a "drive-relative" path (a path relative to the root directory of the Current Drive) - in this case, the base path is ignored, and the user path is left alone.

Note that the last case is one almost never actually desired in modern Windows applications (where the notions of Current Drive and Current Working Directory that mattered so much under DOS make little sense), but still supported (presumably for legacy reasons).

For the base path: - it could be a full drive+directory path - it could be a drive-relative path - it could be a drive specification only

All of these are combined ONLY with path-relative user paths...

By now, we're pretty far into options that no one using Windows GUI apps has cared about in well over a decade. Again, we have the notion of drive-relative paths, and also stand-alone drive specifications. It helps to think back to that "File Open" dialog, and that ancient option to change the CWD in response to user actions. But it doesn't help much. In real life, users specifying drive-relative paths... or apps specifying only drive-specifiers for base paths... are less features, more frustrating bugs waiting to happen.

Presumably, this all made sense to whoever wrote it. Perhaps he'd been working with the Windows file system so long that it seemed an obvious way to behave. But for those of us *not* planning to do clean room implementations of Explorer in .NET, a straight-forward, separator-intelligent, application-agnostic Path.Concat() would have been far more useful.

Actually I just faced another problem with ending /. If you have a folder end with / Path.Combine won't put \ anymore.c:\Test with / + file.txt = c:\Test with /file.txtthe correct one should be c:\Test with/\file.txt

This always drives me crazy and I feel exactly the same as you do about it. In anger I typed "Path.Combine is useless" into my address bar and I'm pleased that I got a result that says pretty much exactly what I was thinking. (I have the same problem with constructing Urls, too; if you chain the Uri constructor that takes a Uri and a string, it's vulnerable to leading slashes in the string too. *stab*stab*stab*)

Think of it as doing a DOS cd command on the first path and then the second one and all of the cases above give the expected behavior. For the drive with no slash, think of it as just typing that drive at the dos prompt without the 'cd' - it just changes the current drive and doesn't automatically take you to the root of that drive. A drive with a leading slash means ignore what path you have already and start at the root. Granted, that behavior is not what most people would find the most useful, but it does make sense.

If your second argument starts with a slash, you're saying start from the root of the current drive (I know this isn't linux, but it means the same thing, root directory). So yes, when your second arguments starts with a slash (root dir) it will give you your second argument verbatim because it is in fact the FULL path from the root directory.

@bLOGGER: Windows will use either forward slashes / or back slashes \ for it's directory separator. So it is actually giving you the correct path, just with mixed slashes. And you can't have a file or folder name with a slash in it anyway so you're essentially giving it an invalid path to begin with. Although it is valid because windows accepts either slash for a path separator.

1.) C: + tail == C:tail is correct if annoying. C:\tail is a path starting at the root. C:tail is a path relative to the current directory on C:. This ugliness is from DOS. It is part of the reason that CD and PUSHD do not do the same thing. DOS has this silly notion of a current directory on each volume. If you only ever have C: you would never know.2.) Forward slashes are tolerated almost everywhere (on the command line as well as the API). The exception was some early DOS commands & programs (*.com) which used built in switch processing. The loader parsed /A, /B etc. into a bit array. It was possible to use SwitchChar to make “-“ be the delimiter. Since some apps (almost all) do their own parsing without respecting the SwitchChar, it was just confusing. Using forward slash for flags led to using backslash for path delimiters in DOS 2.0 when directories were first introduced. Too bad the word has not gotten out that forward slash works everywhere, because I hate seeing \ or \\ in strings. It also drives me crazy when some uneducated announcer tells you to “logon” to www.something.com “backslash” free offer.3.) I totally agree that Path.Combine(”c:\\” + \\tail) is broken. I have seen it used, as someone else mentioned to provide a “default” base path. But it is absolutely horrible to name the method combine, given its semantics.4.) What is good to know is that you do not have to use Path.Combine at all! Extra slashes forward & back are quietly ignored in a path. The one place that adding a backslash matters is between C: + tail. See #1

The only reason I use Path.Combine is so I don’t have to explain why you don’t have to use it to co-workers.