As far as a I know, there is not any way to do a reverse lookup, where you just provide number of programming languages and it gives you all the associated repositories.

So I thought, lets give up on it and enjoy the coffee, I had one on my desk.

I closed the question tab and started enjoying the coffee but then I opened it again because I felt that there may be a chance, if I use the GitHub Archive project.

GitHub Archive is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis.

This shift in plan was also not enough as GitHub Archive gives only the major programming language for a project.
Now it was the time to shift to coffee full time but Nope! I had another idea.

The idea was to collect good and known projects of GitHub and rank them according to no. of programming languages in them. This sounds like Brute Force but it was the only way left.

I headed towards the GitHub API for it but found that they gives you the first 1000 search results only.

So I moved to GitHub Archive finally and downloaded a .csv file of all the repositories with greater than equal to 500 stars. Now, I think it covers most of your good and known projects, happy now?

I Vimed the .csv file and found that some URLs were missing the owner’s name.

Vim Screenshot

:g/https:\/\/github.com\/\//d

I removed all the corrupted lines using this command.

If you’re thinking that by removing lines, we won’t consider these repositories then take a deep breath and don’t think so, we will consider them, keep reading.

Then I wrote a Python module to interact with the GitHub API, named it postman.
The postman uses Authentication Token from GitHub to prevent the rate-limiting.

I don’t know how much did postman take in its task as I was attending my classes then.
But when I came back from class in the evening, I saw that postman was done with the task. Wow, an obedient postman.

Right now if you search in GitHub for repositories having 500 or more stars, you will see around 7,200 results. So we have been doing this right so far as GitHub Archive don’t have all the data.

Once all the data is available, it’s time to rock ‘n’ roll. Rstudio is always there to help us.

Plotting the stars and languages distribution over all the repositories, we get this graph.

Stars and Language distribution

You can see that most of our good and known projects are in the bottom-left corner. Almost 90% of our repositories have less than 20,000 stars and 20 languages.

There is only one repository with more than 40,000 stars, which is the twbs/bootstrap.

OK, let’s come to the question again. I think ordering repositories according to their no. of programming languages will work.