Git Unite – Fix Case Sensitive File Paths on Windows

Git Unite is a utility that fixes case sensitive file paths present in a git repository index on Windows. Since Windows is not case sensitive, the git index case sensitivity issue does not manifest itself until browsing the code repository on GitHub or cloning the repository to a case sensitive file system on Linux.

Introducing case sensitive file paths into the git index on a case insensitive operating system like Windows is easier than you think. A simple ‘git mv .\Where\Waldo where\is\Waldo‘ is all you need to create two separate paths in the git index, but the Windows working directory will only report one. There might be git config settings that help avoid this problem, but controlling the settings and behavior of 20+ contributors on a project team is nearly impossible.

The problem is exacerbated when hundreds of files are moved during a repository layout reorganization. If the user moving the files is not careful, these case sensitive path names will pollute the git index but appear fine in the working directory. Cleaning up these case sensitive file path issues on Windows is tedious, and this is where Git Unite helps out.

Git Unite will search the git repository index for file paths that do not match the same case that Windows is using. For each git index path case mismatch found, Git Unite will update the git index entry with the case reported by the Windows file system.

Usage

Usage: Git.Unite [OPTIONS]+ repository
Unite the git repository index file paths with current Windows case usage.
If no repository path is specified, the current directory is used.
Options:
--dry-run dry run without making changes
-h, --help show this message and exit

History

I work on a project that has one particular git repository tracking over 7,000 files. The repository contains a mixture of ASP.NET MVC3 code, SQL Server SSIS ETL packages, and PowerShell scripts. It all started one day when an ETL developer could not locate the package she developed on the GitHub web site.

I took a look at the git repository on her machine and the ETL package was clearly there under an Etl\Some\Dir\Path folder. The repository reported being up to date with origin/master, but it took several minutes before I noticed an etl and Etl folder on the GitHub web site.

It turns out that the ETL team was in the process reorganizing the ETL packages into a new directory structure layout. I booted up a VM running Ubuntu and cloned the repository down to a case sensitive file system. I found 694 ETL files that were tracked in the git index with a directory path case different than the one reported by the Windows file system.

I fixed the problem by using a combination of find, sort, and awk to build a bash script to run the 694 git mv commands. This was a painful process that I did not want to repeat so I decided to build a tool anyone on the team could use on Windows to fix the problem.

In fact, two months later the same issue appeared again in a different repository. This time I was able to install the Git Unite utility on the user’s machine and fix the issue in a couple minutes. We tracked down the source of the problem to a developer that hand-typed the target directory of a git mv command in all lowercase.

Example Scenario

Here is a representative example using Posh-Git on Windows 7 as to how someone can introduce case sensitive file paths on a case insensitive file system.

Git Unite clears up the confusion by reconciling the git index file path with the same case Windows is using. When I go back and look at the repository on GitHub, there is only one place Where Waldo could be: As far as Windows was concerned, Waldo was here the whole time:

Todd, this turned out to be a*very* timely post for us. I’ve pulled down the source, built it, and can perform a dry run, but running it I get an error (sorry for the wall of text):

Unhandled Exception: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. —> System.NullReferenceException: Object reference not set to an instance of an object.

at Git.Unite.Program.c__DisplayClass4.b__3(String p) in c:Projectsgit-unitesrcGit.UniteProgram.cs:line 44

at System.Collections.Generic.List`1.ForEach(Action`1 action)

at Git.Unite.Program.Main(String[] args) in c:Projectsgit-unitesrcGit.UniteProgram.cs:line 44

Any insight you could provide would be greatly appreciated!

http://www.woodcp.com/ Todd A. Wood

What command line invocation are you using to specify the working dir? I will clone down the repo and try to reproduce.

Thanks.

mikethescott

Hi. You’re correct. After a dry run picked out the changes it would make, I ran it from the bin/Debug directory with no parameters other than the directory containing the repository:

C:Projectsgit-unitesrcGit.UnitebinDebug> git.unite C:Projectsmatlab

and got the stack trace above. Sorry if that wasn’t clear from my description.

http://www.woodcp.com/ Todd A. Wood

It looks like there might be one or more files that simply changing case causes unintended limitations. I pushed up a change to wrap the LibGit2Sharp remove/add to index calls in a try catch. Pull down the changes and see if it identifies the file(s) causing problems with a simple rename.

error changing: third_partyapplication_componentsMATLABpmtk3-1nov12demoscatFAdemoAuto.m~ -> Third_PartyApplication_ComponentsMATLABpmtk3-1nov12demoscatFAdemoAuto.m~ [Exception has been thrown by the target of an invocation.]

… and several (dozen) more…

they appear to be artifacts of one of my cow-orkers using emacs or the like to edit some of their files;

The rest of the changes applied, and all of the simple moves were successful. I should be able to manually remove these stragglers on one of our linux machines.

http://www.woodcp.com/ Todd A. Wood

Excellent. Glad I could help you out, and I will try to track down that edge case. By chance, does the catFAdemoAuto.m~ file exist in both directory locations?

mikethescott

It does indeed exist in both locations.

Thanks for the help!

http://www.woodcp.com/ Todd A. Wood

The error ‘at LibGitUnite.GitUnite.Process(String path, Boolean dryrun) in c:Projectsgit-unitesrcLibGitUniteGitUnite.cs:line 64’ suggets it was not a –dry-run.

I cloned it down, did a ‘build.bat’ and ran a dry run against the repo itself i.e.

It would be even better for me if unite had the ability to determine casing from a branch in git instead of the local filesystem.

In my workplace our only “official” repository is TFS. I use git-tf (from Microsoft) to initially clone TFS repositories to git, and fetch periodically to keep my git repositories up to date.

The latest TFS shows up as a remote branch origin_tfs/tfs.

I do all of my work on a different branch and periodically merging from the origin_tfs/tfs into my local branch.

When I have changes to go back to TFS, I push to a shelve set in TFS, then merge to TFS from there.

It causes grief when casing is changed in the TFS repository.

If I could base casing on the origin_tfs/tfs branch it should eliminate most if not all of my issues. I’ve been looking at the git-unite sources (well done) and may take it on myself, but I have a bit of a learning curve on libgit2 to get over first.

http://www.woodcp.com/ Todd A. Wood

Git.Unite is basically comparing the git index casing with the Windows OS file system casing. You could try a –dry-run on the git repository with the origin_tfs/tfs branch checked out. Git maintains the index with each branch and changes it when you switch between branches.