CodeCarbonCopy: System that Allows Automatic Code Reuse

“CodeCarbonCopy enables one of the holy grails of software engineering: automatic code reuse,” says Stelios Sidiroglou-Douskos, a research scientist at CSAIL.

MIT’s CSAIL scientists have devised a new system that allows programmers to use the concept of code reuse. In other words, a programmer can select the code from one program and use it into another. The system will make required modifications if necessary for example, changing variable names or integrating the code into its new context.

This newly developed system called CodeCarbonCopy holds the potential to translate between ‘data representations’ used by the donor and recipient programs.

Consider the example of an image-processing program. The program requires the ability to handle files in a range of formats like jpeg, tiff, or png. But internally, it will represent all such images using a single standardized scheme. However, its depend on the program because different programs use different internal scheme.

The system automatically maps the donor program’s scheme onto that of the recipient, to import code seamlessly.

Stelios Sidiroglou-Douskos, a research scientist at CSAIL said, “CodeCarbonCopy enables one of the holy grails of software engineering: automatic code reuse. It’s another step toward automating the human away from the development cycle. Our view is that perhaps we have written most of the software that we’ll ever need, we now just need to reuse it.”

Scientists did 8 practicals, where they used CodeCarbonCopy to transplant code between six popular open-source image-processing programs. Seven out of them executed properly with new functionalities.

During transplanting code, the system primarily inserts the same input file to both programs. It then compares how the two programs process the file. Once it found correlation among both, it shows them to the user. It also shows the variables that it did not found correspondence with. Thus, the user can flag those variables as unnecessary. Next, the system will automatically excise any operations that make use of them from the transplanted code.

Its next task is to map the data representation. The system does it by searching for precise values that both programs stored in memory. Once it found a semantic correlation between the values stored by both programs, it generates a set of operations for translating between representations.

During trials, it worked fine with file formats, such as images, whose data is rigidly organized, and with programs, such as image processors, that store data representations in arrays. Now, scientists are seeking to use it in file formats that permit more flexible data organization and programs that use data structures other than arrays, such as trees or linked lists.

Professor Vitaly Shmatikov said, “In general, code quoting is where a lot of problems in software come from. Both bugs and security vulnerabilities, a lot of them occur when there is functionality in one place, and someone tries to either cut and paste or reimplement this functionality in another place.”

“They make a small mistake, and that’s how things break. So having an automated way of moving code from one place to another would be a huge, huge deal, and this is a very solid step toward having it.”

“Recognizing irrelevant code that’s not important for the functionality that they’re quoting, that’s another technical innovation that’s important. That’s the kind of thing that was an obstacle for a lot of previous approaches, that you know the right code is there, but it’s mixed up with a lot of code that is not relevant to what you’re trying to do. So being able to separate that out is a fairly significant technical contribution.”