I one time tried using git to manage my data. The problem is, I frequently have thousands of files and gigabytes of data. And git just does not handle that well.[1]

One time, I even tried building a git repo that just had the history of pdb snapshots. The PDB frequently has updates, and I have run into many cases where an analysis of a structure was done in a paper 3 years ago, but the structure has been updated and changed since then, making the paper make no sense until I thought to look at the history of changes to the structure. Unfortunately, git could not handle this at all when I tried it, taking days to construct the repo and then that repo was unbearably slow when I tried to use it.

Git would probably work well for storing the data used by most bench scientists, but for a computational chemist puking up gigabytes of data weekly on a single project, it is sadly horrible for handling the history of your data.