New data on open source: Reinventing the wheel every day

New data from the open source reveals the story of a simple javascript function. One line of code was re-invented over 100 times and duplicated over 1,000 times across GitHub’s top 10,000 repositories. This is only a symptom of a much deeper problem.

Imagine every time you wanted to drive a car, you had to build new wheels. People would probably still be riding horses to work. Elegant, some might say, but a terrible waste of time and effort. New data shows this is exactly what is happening in 2017. If you are a developer, you might be reinventing the smallest of functionalities across repositories and microservices every day.

Code components are the fundamental building blocks of any application. they are the atomic building blocks of our technological future. Different functionalities can and should be reused across different applications, repositories, and projects. In practice, this rarely happens. Instead, people often re-invent or duplicate the same code over and over again.The overhead of creating and maintaining hundreds of tiny repositories and micro-packages simplyisn’t practical.

To see how deep and how far the phenomenon goes, we took a deep look into the guts of the open source on GitHub.

The story of “isString”

A semantic code identification technology was used to take a deep look into the guts of the open source on GitHub. The top 10,000 Javascript repositories were analyzed. Our scanners were looking to see how many times people reinvented one simple functionality: checking if a variable is a string. Normally, this can be done with 1-4 lines of code. Here are the results:

This simple functionality had been written in more than 100 different ways across only 10K repositories. The top 10 implementations were duplicated over 1,000 times. Given that GitHub hosts 55 Million repositories, the same function was duplicated millions of times. Here are a few examples from top open source projects:

Although it is true that change is necessary for evolution, these numbers mean bad new for everyone, for two main reasons:

First, constantly reinventing small pieces of code takes time and effort. Not only is it wasteful, but it actually holds back innovation. Reinvention Competes for the same time and resources which could better have been invested in building new things.

Second, code duplications are bad. Trying to fix a bug duplicated across dozens of places is hard and takes large amounts of time, and is also likely to break stuff. The larger the code base and the more repositories you have, the worse it becomes.

Why is it happening

The obvious solution would be to make code components reusable across repositories. Much had been said about code reusability. Renown community members post about designing reusable pieces of code. Others debate and struggle to force small components into their own repositories and packages. Most agree, there are three major problems that prevent us from building an arsenal of hundreds of small reusable components:

Creation Overhead: Creating a new repository and a package for every small component will take a lifetime. There is simply too much configuration overhead required to make this process practical at scale.

Maintenance: maintaining dozens or hundreds of tiny repositories and packages is no joke and neither is modifying small packages going through multiple demanding steps every time (cloning, Linking, debugging etc.). This may very well end up taking more time and effort than it could save.

Discoverability: packages are hard to find. No one can say for sure what’s really out there, or what to trust and use (we all remember the left-pad story). Organizing hundreds of micro-packages and quickly finding the right one to use is no easy task.

Bottom line is: very few people create and maintains such an arsenal of micro-packages.

Write code once, use it anywhere

So, how can we change things? A good place to start would be dealing with the three problems: making reusable components quick to create, simple to maintain and easy to find.

To do exactly that, a new open source project called Bit has been recently released to GitHub. But is a virtualized code component repository. It enables developers to build a set of reusable components and use them anywhere they are needed.

In a way that might sound somewhat similar (although different) to what Docker did for VMs, Bit adds a virtualized level of abstraction. It allows developers to create reusable components with almost no overhead at all and use them as a dynamic API. This means using nothing but the code actually used in your application.

Bit solves all of the three problems mentioned above using a virtual repository called a “Scope.“ A Scope allows you to create and model components without the overhead we know today. DDeveloperscan then find and use them with a unique NLP based semantic search engine. Scopes are distributed, which adds similar advantages known from a distributed Git repository. They can be created anywhere, and even connected to create a distributed network. A contained and reusable environment helps each component run and build anywhere. Scopes also help when collaborating as a team.

And in conclusion…

Code duplications (or reinvention) are a serious problem, and the data drawn from GitHub shows how widespread it really is. This is happening mainly because there isn’t a practical alternative that makes it possible to create a growing set of reusable components. Open source projects such as Bit or others can help solve this problem, saving valuable time and effort.

Bit is language agnostic by design, and uses special drivers to work with different languages. In the not so distant future, we could all work with virtual code bases composing pieces of code together to build anything (as described in the Unix philosophy). Meanwhile, using Bit or finding new ways to reuse atomic components would be a good place to start.

Related Posts

As anyone who’s traveled widely before the days of connected smart phones and GPS knows the pain when travel plans go awry and you end up at your place of accommodation only to find yourself locked out in the middle of the night. It’s the stuff nightmares are made of and a thing of the past… Read more »

Wearables have yet to convince the world of their transformative power. Super smartwatches and fitness trackers may have become a part of our daily lives, but they still struggle to create lasting meaning for the user. However, as a period of senseless innovation evolves into super smart sensing technology, we’re shifting into an entirely new… Read more »

Do you ever wander what happened to the Google Glass, that amazing looking contraption that was pushed so heavily a few years ago? Though it doesn’t seem to have become the worldwide phenomenon Google was hoping for, many businesses have found important uses for it. Boeing is a company that manufactures aircraft for airlines and… Read more »

Amazon’s voice assistant Alexa has received a large update, making it much easier to find and activate skills from third-party services. The Alexa mobile app that shows all the available skills—totalling 1,400 as of Tuesday—has been revamped to show skills in an organized fashion, and the app now features search. See Also: Is DIY smart… Read more »

Microsoft used to be able to count on developers to embrace its technologies and extend its lead in the enterprise. Today that relationship is much more complicated, as evidenced by a new Stack Overflow developer survey. Let’s be clear: the new Microsoft under CEO Satya Nadella has been rejuvenating its partnership with developers for years,… Read more »