Ryan Barrett's blog

distribution file statistics

I’ve recently packaged and released a few small programs, and I spent a little
time thinking about what files to include. If you’ve used any *nix OS before,
the following shell session will look very familiar:

If you haven’t used *nix much, this is a typical list of files and directories
that a program comes with. Most programs have a README file. Other common files
include CHANGELOG, NEWS, and AUTHORS. Also, some programs have different names
for the same type of file, such as LICENSE, COPYING, and COPYRIGHT.

I was curious to see how common each file is, so I looked at many of the
programs that ship with RedHat 9 and calculated some basic statistics. Out of
412 programs total, here’s the frequency of each file, grouped by type:

Filename

Percent of projects with this file

Percent of projects with this type of file

README

73%

75%

MANUAL

1%

USAGE

0%

COPYING

49%

59%

LICENSE

5%

LICENCE

1%

License

0%

COPYRIGHT

3%

Copyright

2%

ChangeLog

41%

56%

CHANGES

9%

Changelog

1%

CHANGELOG

1%

Changes

0%

changelog

0%

NOTES

1%

RELNOTES

1%

VERSION

1%

RELEASE

0%

NEWS

39%

42%

ANNOUNCE

2%

WHATSNEW

0%

WhatsNew

0%

announce

0%

AUTHORS

33%

42%

THANKS

5%

CREDITS

3%

MAINTAINERS

0%

TODO

24%

24%

ToDo

0%%

INSTALL

12%

Install

0%

BUGS

5%

7%

PROBLEMS

1%

Problems

0%

TROUBLESHOOTING

0%

FAQ

4%

4%

HACKING

2%

2%

HISTORY

1%

1%

PROJECTS

1%

1%

It’s not surprising to see that README is by far the most common file. However,
I was surprised at the number of different names for the same types of files,
especially for license and changelog types of files. However, it’s reassuring
that the most common names, COPYING and ChangeLog respectively, are used 90% and
80% of the time. For the license files specifically, COPYING is the GNU
standard. (Personally, I prefer the
more straightforward LICENSE.)

Judging from this lineup, a de facto standard set of files would include README,
COPYING, ChangeLog, NEWS, and for larger projects, AUTHORS.

Also, note that the total percentages for each type of file don’t all add up.
This is due to rounding.