DevOps Stack Exchange is a question and answer site for software engineers working on automated testing, continuous delivery, service integration and monitoring, and building SDLC infrastructure. Join them; it only takes a minute:

If you manage a lot of Git repositories (or other, but Git is a specific and common technology), it could happen that depending on users there will be quite different usage profiles. On the worst case end you could face issues with broken repositories (unplanned work, risks) and wasted storage (which is claimed to be cheap but is still accounted for).

Who is responsible for imposing minimum acceptance criteria for "quality of usage" (I cannot find better term for now) and what are realistic quantifiable measures you could install and link to actionables?

Examples of anti-patterns and related risks you might be dealing with:

Folks upload files, also binaries, which are neither part of the software nor test documents

Binary libraries instead of using dependency management

Repos become too big (over 15G as stated here [1])

Repos are below 15G but the history is many gigabytes

Other?

Or, should anybody care about these things at all? (formal objections, not opinions, why not care)

2 Answers
2

Who is responsible for checking that people don't check in commits that have large files? The same people who are responsible for checking that the commits aren't bad in other ways: everyone.

If git is new in your company, make sure that the git education includes this sort of thing. After that, it's up to code reviewers to notice mistakes and correct them.

If it becomes a repeated problem, then whoever is in charge of developer tooling can write some checks that run every night or whatever and let people know that something slipped through. But that seems like overkill in most situations.

Having spent lots of time on github and various private corporate repos on gitlab and such there are certain things that make a huge difference in how good a repo seems to me:

Does the README clearly state what the software does? Hopefully it also provides some code snippets, usage examples, and links to the preferred support forum. Any build or runtime prerequisites should be clearly spelled out. Are the primary authors clearly identified?

Are there releases? If nobody has tagged a release it is easy to assume that the repo is still in early development and not ready for secondary consumption.

Is there a LICENSE? If I don't know that I can not legally use this software and it makes it much harder to decide to invest the time into it. On a private corporate server I would assume that everything is valid for internal use, but in some cases that might not work either.

Is there a link to a CI environment? If the repository doesn't even show the minimal effort at providing tests and access to a CI environment that regularly runs those tests it is a strong vote against it.

Are the build instructions for a new contributor clear? Some people may have this containerized, but the build instructions for doing it manually should be viable as well.

In a corporate environment I would expect that you could impose other restrictions semi-automatically. Does the code fit the code formatting guidelines? Are binaries excluded (a good idea if you have a central artifact repository for such things)? I wouldn't set an arbitrary size limitation. Some projects may just grow that large. That's not great from an organizational perspective, but it is easy enough to implement technically.

As for "who" does this I'd say it ends up being a combination of people. To start you need buy-in from engineering management. Then it could be a DevOps person or some random developer who implements it in a particular place.