Most programming languages are Turing complete, which means that any task that can be solved in one language can be solved in another one, or even on Turing machine. Then why aren't there automatic translators that can convert programs from any given language to any other language? I've seen couple attempts for two languages, but they always work only on a limited subset of a language and can hardly be used for converting real projects.

Is it possible, at least in theory, to write 100% correct translator between all languages? What are the challenges in practice? Are there any existing translators that work?

There are either too many possible answers, or good answers would be too long for this format. Please add details to narrow the answer set or to isolate an issue that can be answered in a few paragraphs.
If this question can be reworded to fit the rules in the help center, please edit the question.

There are some. C to Pascal and Pascal to C translators were quite common at one point. As answers below suggest, the output usually wasn't that readable without at least some manual tidying up. And these are relatively simple languages with relatively simple libraries - doing the job well for e.g. C++ to Haskell or visa versa would probably be impossible.
–
Steve314Sep 23 '11 at 0:35

Check out Roslyn the .net compiler as a service which has the ability to translate C# to VB and vice versa.
–
Daniel LittleNov 6 '12 at 6:14

2

All compilers translate one PL to another, they do not guarantee that the code in the target PL is easy to read though
–
jk.Nov 6 '12 at 7:31

12 Answers
12

The biggest problem is not the actual translation of program code, but the porting of the platform API.

Consider a PHP to Java translator. The only feasible way to do that without embedding part of the PHP binary is to reimplement all of PHP's modules and API's in Java. This involves implementing over 10.000 functions. Compared to that the job of actually translating the syntax is easy as pie. And even after all that work you wouldn't have Java code, you'd have some sort of monstrosity that happens to run on the Java platform, but that was structured like PHP on the inside.

This is why the only such tools that come to mind are all about translating code to deploy it, not to maintain it afterwards. Google's GWT "compiles" Java to JavaScript. Facebook's hiphop compiles PHP into C.

If you have an intermediate format, then you could implement something that translates a program in Language X to that format, and also from that format to Language Y. Implement those conversions for all languages you're interested and you're done, right?

Well you know what? Such a format already exists: assembly. The compiler already does the "Language X to assembly" conversion, and disassemblers to the "assembly to Language Y" conversion.

Now, assembly is not that great a language for doing the reverse conversion, but MSIL is actually not that bad. Download Reflector and you'll see it's got options to disassemble a .NET assembly into a bunch of different languages (and plugins provide even more). So it's quite possible to take a program in C#, compile it to a DLL (that is, MSIL), then use reflector to disassemble it into VB, C++/CLI, F#, and a whole bunch others. Of course, all the other conversion work, too. Take an F# file, compile to a DLL, use Reflector to convert it to C#.

Of course, the two big problems that you'll find are:

The code is basically unreadable. MSIL (even with debugging information) removes a lot of information from the original source, so the translated version doesn't have 100% fidelity (theoretically doing a C#->MSIL->C# conversion should give you back the original code, but it won't).

Many .NET languages have their own custom libraries (e.g. the VB runtime library, F# library and so on). These would need to be included (or converted) when you do your conversion as well.

There's really nothing to get around #2, but you could probably get around #1 with some additional annotations in the MSIL (via attributes, maybe). That would be additional work, of course.

A lot of the metadata from the original source is included in the MSIL (including XML comments and the original method, property and member names), so I don't think the conversion to C# is as unreadable as you say it is. Try disassembling parts of the .NET framework; it is very readable. Of course, the situation could be different for an F# to C# conversion.
–
Robert HarveyDec 20 '10 at 18:03

@Robert: XML comments aren't included in the MSIL. If you look in Microsoft.NET\Framework\v2.0.50727\en for example, you can see all of the XML documentation for the system libraries. This is what Reflector (et al) use to display the comments. The conversion is not unreadable, all I was saying is that it's not 100% fidelity which you might expect from a source-level translation.
–
Dean HardingDec 20 '10 at 21:54

2

A disassembler converts the machine executable binary back into assembler for that particular processor type (Not all of the world is an x86). You really mean a decompiler to take the compiled code back to the source. This is a horrifically difficult task as each compiler, from each manufacturer, at each optimisation level will convert the source lines into a different output binary form.
–
ʎəʞo uɐןSep 22 '11 at 14:22

You hit the nail on the head. Try reading the code that comes out of LLVM's C backend. It's technically legal C code but It Ain't Pretty (TM).
–
dsimchaSep 22 '11 at 19:55

1

@dsimcha: Readability aside that C backend makes the output so much easier to read than debugging or disassembly. I'm so glad they brought that backend back again, after it went out of maintenance for a small while.
–
TechZillaJun 1 '13 at 23:20

It's theoretically possible but mostly useless. Almost any combination of source and target languages is possible, but in most cases nobody would ever want to look at or use the result.

A fair number of compilers do target C, simply because C compilers are available for almost every platform in existence (and there are automatic compiler generators that will let you design a processor, and automatically generate a C compiler that targets your new processor). There are also, of course, a fair number of implementations that target the languages used by various virtual machines such as .NET, JVM, C--, and LLVM.

The key point, however, is that it's really only useful if you treat the target is basically an assembly language that's only used as a step in the compilation process. In particular, you generally do not want a normal programmer to read or work with that result; it usually won't be very readable.

Both languages, the source and the target language are compiled to (virtual)machinecode anyways*, so for technical reasons there's no need to have a compiler to another high level language.

Languages are for humans. So, the implicit requirement of your question is: 'why is there no translator that generates readable code', and the answer would be (imho): because if there are two languages that are sufficently different, the ways 'readable code' is written is different in a way that would not just require to translate the algorithms, but take different algorithms.

For example, compare a typical iteration in C and one in lisp. Or pythons 'one best way' with idiomatic ruby.

Here, the same problems begin to appear you have in real languages, like you translate 'It's raining cats and dogs' to something with the meaning of 'It's pouring like it would from buckets' when translating from english to german, you can't translate word by word anymore, but you have to look for the meaning.

Good answer. One could add that if two languages had precisely the same set of features and idioms, it would be possible to translate one language to another fairly efficiently, but most languages are designed for the purpose of supporting features and idioms which their creators feel are not adequately supported in other languages. Mechanical translation of maintainable code is sometimes workable when the features and idioms in the target language are a superset of those in the source language, but such situations are not terribly common.
–
supercatAug 13 '12 at 15:34

FWIW, there is a translator from Java to D. It's called TioPort and was used in a fairly serious attempt to port SWT to D. The main problem it ran into was that it would have been necessary to port massive portions of the Java standard library.

While it's not code translation per se, the concept of language workbenches shows how something akin to a 100% correct translator between all languages could be implemented.

In our current approach, the source code is stored in a textual format. During compilation, those human-readable text files are parsed into an abstract syntax tree representation, which in turn is used to generate either bytecode or machine code. This abstract representation however is temporary and internal to the compiler.

In the language workbench approach, a similar abstract syntax tree representation is the permanent, stored artifact. Both the machine code and the textual 'source' code are generated based on this abstract representation. One of the consequences of such a method is that the abstract representation of the program is actually language-agnostic, and can be used to generate textual code in any implemented language. Meaning that one person can work on different aspects of the system freely using whichever language they see as the most appropriate, or each member of the team can work on the shared project in the language they are most familiar with.

As far as I know, the technology is still far from being usable in mainstream development, however there are several groups working on it independently. Hard to tell whether any of them will live up to their promises, but it would be interesting to see that happen.

There are some automatic translators. If your goal is to produce compilable code, rather than readable code, it is quite possible and occasionally useful, just not very often. Famously, the first C++ compiler was not actually a compiler, but translated C++ into (really complicated) C source that was then compiled by the C compiler. Many compilers can generate assembly code on request-- but instead of spitting out assembly text and then translating it to machine code, they can normally generate machine code directly.

Given a complete specification of language A, it is not that hard in principle to write a program that expresses its directives in some language B. But usually anyone who goes to the trouble will choose something really low level for "language B": Machine code, or these days bytecode: Jython is an implementation of python that generates java byte code, which is interpreted by the Java VM. No need to bother writing out and compiling java class hierarchies!

Every compiler translates the "primary language," like C++, to the machine's native assembly language or architecture-independent bytecode in the case of interpreted languages.

I imagine that's not what you're talking about, though. You probably want a translator that converts C++ to something like Java or Python. What's the point of that, though? At best, the end result will have the exact same efficiency as the original source. (Practically, it'll be much worse.)

If you just want code to be translated so you can read it as a language you understand, such a translator would have the opposite of the desired effect. You will be left with a slew of cryptic, unintuitive and unreadable code.

This is because only the most trivial things translate directly from one language to another. Often, what's simple in one language requires massive libraries for another - or might be impossible altogether. Therefore:

If the program is trivial, you might get a decent result. But then, if it's that simple, what's even the point of running it through a translator?

If the program is nontrivial, the code will be of low quality.

In the end, the only way to write good code is to actually write it. Computers simply can't - at least not yet - match humans on matters of readability, best-practices and elegant solutions.

your analogy would also then apply to normal compilation, and we know empirically it does not! Computers do 'generate' (not write) good quality code. What they often do badly is readability/maintainability. If someone did need such a process, which believe me people occasionally do, neither problems are show stoppers. If they are, well then, obviously the translation was never important originally.
–
TechZillaJun 1 '13 at 23:24

There are no language translators for programming languages because programming languages are incredibly complex. While it is hypothetically possible, there are many challenges.

The first challenge is merely in the acceptable practices of the language. Converting between two object oriented languages like Java and C++ is incredibly complex, and they are both C based. The translator program would have to have perfect knowledge of the standard libraries for both languages and be able to know the differences in behavior. You would have to create a massive dictionary and even then, the differences in programming styles from programmer to programmer would mean that it would have to guess on how to perform some changes.

Once you've gotten the syntax translation down, you then have to figure out how to convert a construct in the first language to a construct in the second language. This is fine if you're going an object in C++ to an object in Java (comparatively easy that is) but what do you do with your C++ structs? Or the functions outside of the C++ classes? Deciding how to handle this can be tricky as it can result in another problem, namely the creation of a blob object. The blob is an antipattern which is common enough.

This isn't a complete list of the problems, but those are just two and they're big ones. One of my professors mentioned that someone convinced his employer they could make one from machine code to C in the 80s, but it didn't work then. I doubt there will ever be one that works fully.

I think there is no need to know existing libraries, it can just translate libraries as it goes (assuming they have sources available).
–
sergOct 17 '10 at 2:49

1

That actually increases the complexity of the second problem then. And that's assuming you have access to the source code to translate it. Either way, it's still rather infeasible.
–
indyK1ngOct 17 '10 at 4:18

+1 point about libs is totally valid, and there are ALWAYS libs.
–
YarMar 23 '12 at 17:39

The point of compiling is to get something useful for the computer. ie something that can run. Why compile to something that may even be higher level than what you wrote it in?

I like the strategy of .NET better. Compile everything to a common language. This gives the benefit of the languages being able to communicate without needing to create (N^2)-N cross language compilers.

For example if you had 10 programming languages you would only need to write 10 compilers under the .NET model and they could all communicate with each other. If you made all possible cross language compilers you would need to write 90 compilers. That's a lot of extra work for little benefit.

I think Haxe might be a reasonable solution to this problem - it is a programming language that is designed to be compiled to as many target languages as possible. It currently (at the time this question was written) supports Javascript, PHP, C++, Java, and Flash, as well as a couple other languages.

It is possible to translate between various programming languages using a combination of source-to-source compilers: