GitHub README Analyzer

An experiment to algorithmically improve your GitHub README

Think you've got a great README?

We know writing a README can be a challenge. And, how do you know what makes a README any good?

Instead of guessing, we tried to train a model using natural language processing, and machine learning. This data science experiment uses the 10,000 most starred GitHub repositories, across the 10 most popular programming languages.

Your README Report Card

Your README's Overall Grade

Your overall score is calculated as an average of your README's headers, code samples, text, and image scores. Each section provides insights and suggestions for improving the quality of your README relative to the 10,000 popular repositories we've analyzed.

These grades are not definitive. Rather, they're the result of machine learning, and are provided on a "best effort" basis. We recognize that the model doesn't account for all the complexity and nuance a README has. Ultimately, you should use your own judgement about what to include, remove, and ignore. Inevitably, there will be results that don't make sense. tl;dr data science is hard.

Feel free to contact us @Algorithmia or by email if you feel something is particularly egregious.

Model Assumptions

Popular repositories probably have a good, well-documented README

Popular repositories have more stars than bad repositories

Each programming language has unique characteristics

In general, we found a higher correlation between a README's quality and the specific headers, and text used throughout. Conversely, we found a lower correlation between the quality and the number of code samples, and the number of images in the README.

In order to correct this, we removed any repository that had zero images, or code snippets from our model, because these are helpful, additive features.

Note: If the README you analyze falls outside of the top 10 languages (i.e. Javascript, Java, Ruby, Python, PHP, HTML, CSS, C++, C, or C#), we default to using a model trained on all of the languages.

Section Headers

Having clear section headers help users quickly find what they're looking for. Our recommendations provide guidance on what sections you should consider adding, changing, or removing. In many cases, section headers can come in multiple forms. Such as Install, Installing, or Installation. In these situations, we simply pick one, and recommend it. Feel free to pick your own flavor.

Code Samples

It should go without saying that code samples are extremely helpful. Many developers jump straight to code examples, rather than reading the documentation. Here, we attempt to make recommendations relative to the average number of code samples in the most popular repositories.

README Text

The text of your README is important, because it explains what your README is about, how it works, why you need it, and more. Your README should be readable, coherent, and clear.

Our model analyzes every word used throughout your README to suggest keywords commonly used in popular README's. For instance, if we recommend that you include a word like "globals," perhaps you should include a sentence or two describing the role of globals in your project.

README Images

Popular repositories on GitHub often have images or badges in their README's. The badges indicate things like continious integration, build status, or package manager inclusion. Other types of images, such as screenshots or GIFs, can also be useful in conveying information about the output or how the code works.