It perfectly makes sense, the compiler remain generic for the respective VM. However, the implementation of VM may vary depending on the machine it is going to be installed i.e. (*nix, windows, mac) x (32 bit, 64 bit).

My question is, instead of writing VM for respective machines, why isn't the compiler is written for that specific machine? By this, instead of downloading respective VM you download the respective compiler and that compiler will take care of the machine-code+OS for that specific machine. End result, the execution of native code for any machine. Definitely, each source code would need compilation for that specific machine but now a days, the automated systems, scm builds can help us do this thing.

Do my reasons of being confused are right or I am missing some bits of technicalities here?

Edit:

PORTABILITY:

Yes, it is one reason but is portability a big issue in today's automated systems? how often do we have to worry about the fact that we don't have to compile it for other machines? Having a code compiled for native machine would give much better performance. Take Java for instance, you can't do low level programming on Windows and you have to choose JNI.

Take automated systems like TeamCity/Jenkins or others. We could have such an automated system setup where code submitted through version control would result in the executable.

you answered your own question in the first paragraph
–
ratchet freakApr 18 '12 at 23:18

6

'twasn't new with Java. UCSD Pascal (for only one example) did the same.
–
Jerry CoffinApr 18 '12 at 23:28

2

@EmAe And I would point out the irony that you picked two CI build systems that run on the JVM.
–
Andrew FinnellApr 18 '12 at 23:58

1

"instead of downloading respective VM you download the respective compiler and that compiler will take care of the machine-code+OS for that specific machine" - we have that, its called C. Its the very essence of how software is distributed on linux, for example (prior to package managers). I'm sorry, but I have to vote to close.
–
GrandmasterBApr 19 '12 at 5:14

Every compiler has just a sequence of virtual machines inside. You can stop the compilation at some point, serialise that intermediate representation and then call the rest of the compiler a "virtual machine interpreter". It won't change anything.
–
SK-logicApr 19 '12 at 15:39

9 Answers
9

My question is, instead of writing VM for respective machines, why isn't the compiler written for that specific machine?

Because then you would no longer have a portable executable; you would have one executable for each platform. The user would have to either download a specific executable for their platform, or compile the code themselves, on their specific platform.

With a VM, each platform just needs its platform-specific VM, and then you can distribute the same executable to every platform.

Today's VM's have better performance than you give them credit for. The use of a VM gives you all of the managed goodies like a type system, library framework and garbage collection for free.
–
Robert HarveyApr 18 '12 at 23:30

2

@Robert: Keep in mind, not all native code is C or C++. They're not slow to write programs in because they're native and therefore "deal with a lot of low-level intricacies;" they're slow to write programs in because they're poorly-designed languages that don't do a good job of dealing with the low-level stuff. But you can get C-level performance out of Delphi with the same ease of development that you find in managed languages.
–
Mason WheelerApr 18 '12 at 23:43

5

@Andrew: People can say Java is fast according to some benchmark that tests some algorithm in some isolated case. But then you go and run an actual Java program, like OpenOffice, and it takes 30 seconds to open the freaking Save dialog, compared to about 1-2 seconds on native code, and they come to the conclusion that all that theoretical speed in the benchmarks means nothing; Java is still slow.
–
Mason WheelerApr 18 '12 at 23:51

2

@MasonWheeler As you deleted your comment and expanded on a new one. The abundance of people that can get away with "programming" in the Java language is significantly higher because the barrier of entry for C and C++ is prohibitive, thus really bad programs. Java is not slow and anecdotal evidence of a horrific application suite does not prove otherwise.
–
Andrew FinnellApr 18 '12 at 23:54

You are missing something huge with these VM's. They do exactly what you say, but automatically. It's call a Just-In-Time Compiler and it is why .NET (On Windows) and Java are extremely close to the speed of natively compiled C++ code.

The Java/C# Source code is converted into byte code. This byte code is then compiled into machine code on the machine it is currently running on. For the most part, the VM will run the native code instead of reprocessing the byte code.

I have skipped over quite a bit and oversimplified the process, but VM's do a substantial amount of work.

You aren't getting my question. I don't have doubts on the performance of code when it is being run on the specific VM. The question is why use VM when we can have compiler to compile it for specific machine.
–
Em AeApr 18 '12 at 23:49

1

@EmAe The VM DOES compile it for the specific machine. It's called a JIT. I highly suggest you read up on it.
–
Andrew FinnellApr 18 '12 at 23:49

1

Okay, I am confuse here. Correct me and help me. So the compiler compile for specific machine even then it is chosen to be executed on VM? If that compiled code (lets say JAVA for instance) is ported over to a VM on UNIX/Mac/Ubuntu does this require to be recompiled? NO. Because for VM the native is BYTE-CODE and that BYTE-CODE is then converted using JIT. Program -> Compiled to byte code -> JIT for machine -> Execution. 4 steps are involved. I am asking that why are we using these 4 steps? then we can Program -> Machine specific compiler -> Execute
–
Em AeApr 18 '12 at 23:52

The same reason you don't rewrite all your code over and over again in assembly. Reuse.
–
Andrew FinnellApr 18 '12 at 23:56

2

Plus, no one says you have to take those four steps, you can always use an AOT compiler. gcc.gnu.org/java
–
TomJApr 19 '12 at 0:13

Like many concepts in computer science, the VM provides an abstraction layer. You write your code against an 'interface' (byte code / intermediate language) and the abstraction layer (the VM) deals with the implementation details (compiling it for the target machine, and other tasks.) The abstraction layer provides you with a set of services whose implementation details you no longer need concern yourself with. In this case, you no longer need to worry about the specifics of the underlying hardware, given that the abstraction layer services (VM) are present. The client benefits in a similar manner - they don't need to know the details about their platform, just that they can use the abstraction layer.

Of course, there are trade-offs with any abstraction. You lose fine-grained control over the details on the other side of the abstraction, and have to rely upon that abstraction to employ a sensible implementation. You need to consider the potential gains through use of an abstraction against the trade-offs. In certain applications the drawbacks may out weigh the benefits - you may need that high level of control for each and every platform.

Your proposal to use an automated system for compilations to different platforms is exactly what the .NET and Java abstraction layers already do. One of the benefits of JIT compilation is that the compiler has specific details about the machine it is running upon. This can introduce potential for optimizations that would otherwise not be possible when performing a release build on a developer's machine.

If you are concerned about run time slowdown due to native code generation and program execution occurring together, you have the option to generate the native code during installation rather than when the program is first executed. .NET provides nGen for this purpose, which can perform native code generation at installation time rather than run time. The CLR will then use the cached native code rather than performing JIT compilation.

Another major advantage of the VM/JIT approach is binary compatibility. For example, let's say you have an assembly/JAR/DLL that contains class A, and another assembly containing class B, which derives from class A. What happens if you change class A, e.g. you add a private member? Adding a private member shouldn't have any influence on class B at all, but if the assembly containing B was already compiled to native code, you would probably get weird bugs and hard-to-reproduce crashes, because the native code was compiled with the old memory layout of A, so it would e.g. reserve too little memory for A's variables, so A's members and the members B added would be mapped to the same memory locations.

If you're using a VM on the other hand, all these problems simply vanish, because when the code is compiled to native code, the layout of all classes is known.

My question is, instead of writing VM for respective machines, why isn't the compiler written for that specific machine?

Let's imagine for a second that this is the way it works. I write a Java application, which I then compile - the compiler then generates an executable for every supported platform. I now have the following executables:

Solaris x64

Solaris x86

Solaris SPARC

Windows x86

Windows x64

Linux x86

Linux x64

MacOS

BSD

etc etc.

I then upload all of these to my website, and provide a download link to each one. I then submit all of these to download.com, tucows.com, sofpedia.com, submitting each one individually.

This seems like a major pain to me! When I update the software to version 1.1 I then need to repeat this process.

And what happens when a new platform emerges? Now my software isnt available for this platform, unless I update my developer software to support it, recompile, then re-release my software for this new platform.

And what about the user? They visit my site, and then have to select from a list exactly the right version of my software to use for their platform. Most users do not know whether they run x86 or x64, so they will probably end up downloading the wrong version and receiving some kind of error. What if they then switch from Windows to a Mac, they now need to go back and re-download my software.

All of this is a major pain, to both the developer and the user, and all of this can be avoided by just compiling to a bytecode then running through a VM, exactly as Java currently does.

If performance is almost identical to running native code, the question is not so much why not compile to native code, by rather why bother compiling to native code?

In the edit of your post, you question the usefulness of portability, wrt VMs.

In my professional life, portability has been the most important factor - My deployment platform is very rarely the same platform that I deploy on, and so I am able to develop & test on Windows, pass to my QA dept, who would test on Linux, and deploy to our production environment on Solaris.

This is not just some isolated and contrived example - this has pretty much been the bread-and-butter of my professional existence for probably the past 12 years of my multi-decade career. I'm sure there are many others with the same or similar experiences.

If we used different (cross-)compilers (on the same source code) to produce different binaries, then you could not feel confident of the quality of each binary, if they each were not tested on its own OS/Architecture.

The wonderful open source FreePascal Compiler project has a tag line of "write-once, compile anywhere" - while this is true, I would not expect to see any reputable software houses doing so, and releasing the applications without thorough testing on all platforms.

I would still question the validity of testing on one platform and deploying on another, even when using a VM. You are deploying on two different VM implementations, one might contain a system specific bug that is not present in the other. The chances are slim, but it is still good practice to test and deploy on identical systems.
–
Gavin CoatesApr 19 '12 at 10:10

TBH, I actually agree, but is it easier to test a VM, than a compiler? (ie. Is a VM more reliable, since it's easier to test, or is it just as risky?) At least with VMs, you know the code has been used by thousands (millions?) of others, whereas only you (and a handful of others) have compiled your application.
–
CrollsterApr 19 '12 at 10:14

Crollster: I agree. in the real world you would use identical hardware and software, but this is sometimes not possible. Using a different platform may be acceptable in most cases, but it is worth bearing in mind that there is a risk of error, albeit relatively small.
–
Gavin CoatesApr 19 '12 at 10:22

You are oversimplifying this a lot, cross-platform is more than just compiling for a specific native instruction set. More common than not, the same C/C++ code cannot be just recompiled on different platforms because the API:s and libraries are different. For instance, GUI development is wildly different on Windows and Ubuntu Linux, socket communication is probably not identical on Unix and IBM z/OS, and so on.

So, to achieve your "write once, compile (and link) anywhere", you would have to create an abstraction layer of some sorts, where you create a common API for all OS-level services. Then you need to implement and distribute this for all platforms you want to support.

Also, while memory models are more and more similar today, there are still some differences between different platforms so you need to abstract memory allocation as well. Easiest here is maybe to create your own "virtual" memory manager on top of the native one.

While you're at it, file systems and access control (ACL) is handled pretty different between, say, Unix and Windows so you'll need to abstract those things as well. Maybe even create you own "File" wrapper for underlying implementations.

Then we have threading. Since Linux only has processes and Windows has threads, we need to abstract this somehow as well.

Well, at this point you have pretty much built a virtual machine. The point here is that while compiling some arbitrary code for different platforms is fairly easy, building working software of any complexity that can be run on any platform is far from trivial.