And I don't mean Autocomplete or automatic code snippets as inserted by modern day editors, or polymorphic code. But what is the state-of-the-art in programs that can go through given inputs and types and information of the desired outputs and output a valid piece of code in a language of choice. I am aware of Genetic Programming, Gene Expression Programming but I don't know of any other efforts. Also googling doesn't turn up much.

Is anyone aware of any advancements on this front?

Edit: When i say "output a valid piece of code", I mean an AI or something similar working out the logic and the flow of control and implement it in an imperative language. Imperative language only, since that's the tough part. Nevertheless, if you know of any new languages being developed to support this kind of idea, please do mention, as maybe our current set of languages are not suitable for the kind of early AI we may first chance upon.

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
If this question can be reworded to fit the rules in the help center, please edit the question.

@kumar Fourth generation (4GL) and fifth generation languages(5GL) are worth looking into(Don't go by just the terminology. There isn't a really close 4GL though DSL's are thought to be. A big leap would be possible only with increased AI that depends on cognition, speech and vision recognition, unsupervised machine learning, advanced pattern recognition, natural language processing and more. The present state of programming in enterprises wouldn't allow that. This is too big a subject for me to answer
–
UbermenschFeb 29 '12 at 8:16

4

an AI or something similar working out the logic and the flow of control and implement it in an imperative language - Very ambitious: The problem is not yet solved for Natural Intelligence.
–
mouvicielFeb 29 '12 at 8:18

12 Answers
12

You will always have to give the computer some rules to work with. But the more those rules are defined in a manner specific to their domain, the less input there will have to be.

Domain-specific languages that target web development require less coding than languages that are more generic. Domain-specific languages that target testing require less coding than languages that don't. Domain-specific languages that target genetics require less coding than languages that don't. And so on.

Now, here comes the big question: When does a domain become big enough to justify writing a domain-specific language for it? Web development and testing are things that at least half of the development world is working on. It was inevitable that frameworks would spring up, reducing the amount of boilerplate code for these things (which is, essentially, a domain-specific language).

But how about your company's business domain? Is it worth focussing on the things that are commonly mentioned in your company and making it so that you can reference those things easily in code? I don't think we've really found that balance yet, although domain-driven design is about answering that question.

@pdr Maybe kumar can expand on his question, but he specifically mentioned genetic programming. DSL is in a different category. I believe we are a long way from self-writing programs, but with the rate technology progresses who can predict what will happen in 25 years?
–
Garrett HallFeb 28 '12 at 20:16

1

@kumar: "Coming up with a legal set of statements, written in the syntax of a language" is easy, that's my point. The difficult part - the part that requires some sort of intelligence, artificial or otherwise, the part that programmers do now - is translating the input. How do you propose that something should be input to an "AI(not in its strict sense)" for it to translate for a non-intelligent computer?
–
pdrFeb 28 '12 at 20:58

All 4GLs are designed to reduce programming effort, the time it takes to develop software, and the cost of software development. They are not always successful in this task, sometimes resulting in inelegant and unmaintainable code. However, given the right problem, the use of an appropriate 4GL can be spectacularly successful

...

A number of different types of 4GLs exist:

Table-driven (codeless) programming, usually running with a runtime framework and libraries. Instead of using code, the developer defines his logic by selecting an operation in a pre-defined list of memory or data table manipulation commands. In other words, instead of coding, the developer uses Table-driven algorithm programming (See also control tables that can be used for this purpose). A good example of this type of 4GL language is PowerBuilder. These types of tools can be used for business application development usually consisting in a package allowing for both business data manipulation and reporting, therefore they come with GUI screens and report editors. They usually offer integration with lower level DLLs generated from a typical 3GL for when the need arise for more hardware/OS specific operations.

Report-generator programming languages take a description of the data format and the report to generate and from that they either generate the required report directly or they generate a program to generate the report. See also RPG

Similarly, forms generators manage online interactions with the application system users or generate programs to do so.

More ambitious 4GLs (sometimes termed fourth generation environments) attempt to automatically generate whole systems from the outputs of CASE tools, specifications of screens and reports, and possibly also the specification of some additional processing logic.

Data management 4GLs such as SAS, SPSS and Stata provide sophisticated coding commands for data manipulation, file reshaping, case selection and data documentation in the preparation of data for statistical analysis and reporting.

A fifth-generation programming language (abbreviated 5GL) is a programming language based around solving problems using constraints given to the program, rather than using an algorithm written by a programmer. Most constraint-based and logic programming languages and some declarative languages are fifth-generation languages.

While fourth-generation programming languages are designed to build specific programs, fifth-generation languages are designed to make the computer solve a given problem without the programmer. This way, the programmer only needs to worry about what problems need to be solved and what conditions need to be met, without worrying about how to implement a routine or algorithm to solve them. Fifth-generation languages are used mainly in artificial intelligence research. Prolog, OPS5, and Mercury are examples of fifth-generation languages.

Ultimately, even if you don't 'program' the computer, someone still has to explain your requirements to the computer.

I do agree that a human(for now) has to explain the inputs & outputs, but after that is there any way known wherein a program writes the code, fitting the problem at hand. The 5GLs were a good pointer, and it will take me a few days to get fully aware of their current state, which is what I will do now. Thanks for chipping in.
–
kumarFeb 28 '12 at 21:01

But then we ditched all that and realized Lisp has been around since 1960. Or at least started using Ruby.
–
Jason LewisFeb 29 '12 at 0:28

@kumar You can use rule-based programming or similar to search and generate programming once you explain the inputs and outputs. But in 5GL, in the purest sense, you just give a description of the problem and it gives you the solution. Simply, machines getting more intelligent. Together with voice,vision,machine learning and huge data analysis, it would take atleast 3 decades to get to that point. But at that time, we wouldn't here answering your questions. Computers would do.
–
UbermenschFeb 29 '12 at 8:22

One of the most discussed approach to automated code generation is "MDA" a.k.a Model driven architecture. Mostly (but not necessarily) one puts up UML through visual GUI editor from which relevant classes are generated.

While, i think the expression of fully functional code might be still far, there are pretty good enough systems that generates complete skeletons.

@DipanMehta I amn't sure but I don't think an UML executable is possible (just give the specifications as UML and generate the software). Also, I am doubtful about how UML adopts to concurrent and parallel computing, functional programming paradigms, research software.
–
UbermenschFeb 29 '12 at 8:29

I've written many code generators for Java and C# that produce working code for various tasks. There are packages like JAXB, which analyzes an XML document and produces corresponding Java classes and marshalling/unmarshalling code to do the translation, and Entity Framework which produces DTO classes for marshalling data to/from a database. There are also tools like Rational XDE (of whatever it's called now) which do round-trip code generation between a class diagram and Java.

If you're looking for something that can take business requirements or a functional spec and turn it into code, I haven't seen much progress in that area. I know OMG is working on some kind of "executable UML", but aside from some DoD prototypes I don't know of any practical implementations.

You just describe what the program should accomplish, or what conditions the results should satisfy. Then you query the system, and get results (or "no solutions").

Of course under the hood, there's a program running, but you never see the code.

Unfortunately declarative programming is not a silver bullet: beyond elementary cases, describing the declarative goal precisely enough still requires considerable effort and skill, not to even mention that in order to get decent performance, you have to take into account various imperfections of the actual, under the hood implementation (e.g. understanding the role of SQL indexes or tail call in recursive definitions...)

Depending on the type of the problem, it could actually be easier to just solve the "how" than to precisely describe the "what". For humans, or most average programmers at least, thinking about "how" seems to come more naturally, and "what exactly" requires more mental gymnastics.

@JoonasPulakka Since all machine-code(low level) is imperative, I meant automated methods of writing programs composed of statements following the syntax of an imperative language(high or low level). Nevertheless your doubt is a great addition and I will add an edit to the question.
–
kumarFeb 28 '12 at 20:40

+1 again for "it could actually be easier to solve the how than precisely describe the what"
–
MarkJFeb 28 '12 at 22:58

@kumar: Yes, all code is imperative under the hood. Furthermore, for example GNU Prolog can compile Prolog code to executables, and there's no reason why it couldn't be compiled to e.g. C, as described here. So there it is - it would create imperative statements, following the syntax of an imperative language (C), straight from the problem definition. There's hardly a practical reason to do that intermediate C step though, as the resulting C code would likely be quite incomprehensible.
–
Joonas PulakkaFeb 29 '12 at 7:06

I disagree with your statement that "Imperative language... that's the tough part". That's the easy part, although it is considerably easier in some languages than others. Figuring out what the users really want, and organizing all that information is the hard part. The "Imperative language" part looks hard because that is when all the real work gets done. That's when the detailed questions about requirements appear, and when all the answers have to be organized into an executable system definition.

There is no programming without programming. Someone has to translate imprecise human wants into a precise specification of a computation. That specification can be in assembly language, or Java, or LISP, some diagrammatic system, or a language yet to be invented. But until computers are capable of deep communication with humans, someone is going to have to talk to the users and precisely define the system.

We're already there! All what we need is a language with today called homo-iconic character and decades earlier "code is data". Define your own environment by bottom-up programming instead designing top-down. You could for instance build your own DSLs inside Lisp. With the approach of Stacking you could putting as much DSLs (layers) on top of each other as you would need for your specific problem. This approach brings you from a very low level representation of S-expressions up to the most complex data abstraction you can ever think of.
So, what is automatic code writing, if not stacking one language on another?

State of the art in automating code writing? There is no "state of the art". But there is a state of perpetual failure. There are no successful attempts so far. Most likely there will never be any successful implementation of this other than a few examples that are very limited in scope.

That may be a good thing since it would put us out of a job.

BTW to people reading.... Don't confuse algorithm creation with trivial CRUD generators like Ruby on Rails. CRUD generation is the execution of a predefined algorithm, not the creation of an algorithm to solve a problem.

Wouldn't any program that could create an algorithm to solve a problem based on desired inputs and outputs simply be execution of a predefined algorithm?
–
DunkFeb 29 '12 at 18:35

@Dunk. It would be both actually. It would be executing a predefined algorithm but also "creating" a "new" algorithm. The "new" and "creation" are the key parts that are hard to do.
–
Lord TydusFeb 29 '12 at 21:11

It'll keep being redefined by humans as computers can do more and more. A car that can park itself would have seemed like magic AI 30 years ago.
–
Michael DurrantMar 11 '12 at 16:26

@Michael Durrant. Execution of a parking algorithm is impressive, but it is not the "creation" of a "new" algorithm. Now if the software itself dynamically created the parking algorithm on the fly.... that would be it. There are lots of neat things algorithms do, but algorithms that create algorithms are a whole different ballpark.
–
Lord TydusMar 11 '12 at 17:26

There are several tools where you can do things without writing any code (MS Access, Filemaker). Some generate code in the background that can be altered. This works well with business apps and database front ends. The user hits a wall and eventually hires a programmer. The logic gets too complex. I've seen web apps that create a form that populates a table. This is great until you need a parent form with a child form handling multiple records. None of them offer this.

Trying to imagine how this works if I want to automate/code the altering of image, video or sound files. Like a database GUI someone could make them for these that generates code instead of just manipulating the file.

Spread sheets are handling everything from simple math to statistics fairly well. Record a Macro and a script is created.

With your question I believe you are asking how much future development will be able to minimize the amount of work a software developer will have to do. Even if you have an AI that can write your whole program, you still have to tell it what to do, just like for an automatic car builder, you still have to give it a blueprint, and that blueprint requires some work.

And if you have an AI, you still have to teach it and it will have to learn through several projects. Therefore, I don't think that an AI is suitable for this kind of work, but rather a more deterministic approach, using code generators. These code generators can become very complex, but need not necessarily employ machine learning.

That said, there already exists research in the areas called Feature-Oriented Software Design and Aspect-Oriented Software Design. These deal with assembling software applications by selecting some features which it should have, and then code is generated for that. The goal is to have implementations for several features that appear repeatedly in a particular domain and assemble them like building blocks, as suitable for your particular application. For web development for example, features would include transactions, statistics, scalability, logging and whatever you can think of as recurring characteristics of different web apps.

Features and aspects are different than components, as they are usually cross-cutting concerns. Take for example logging. You can't just take a library and include it in your application and say you've got logging now. You have to spread your logging calls all over your code, and that is where code generators are handy. I've recently heard about all this stuff from this two partinterview on Software Engineering Radio.

It seems that this kind of research is quite trendy in Europe, and Germany in particular, even in the industry, as I can say from personal experience. Code generation can be useful for generating the necessary infrastructure code, so that the developer can focus exclusively on implementing the specific behaviour of his application and not bother with the same side-issues on every different project.

It remains to be seen how much that application-specific code can be narrowed down. It certainly can't be eliminated completely, only reduced to some sort of blueprint, as I mentioned in the beginning.