but nowadays some interpreted languages are actually JITed, so they can also do the same...

Yes, I know they can do some pretty clever stuff and loops perform well.

So why is the execution time so long for these languages (74 minutes for Python )?

Show me any interpreted language that can do this in two machine instructions - it is not possible.

Because python is python? I doubt it's even JITed at all lol... and I don't even know if there's any shittier language than that...

For other languages, once the code is JITed, I don't see why it couldn't be 2 instructions. JITing takes time, but in most applications it doesn't really matter that much... once the initial JITing is done, it can run fast enough to just JIT everything else during idle cycles. Of course, I'm not saying these languages should be used for performance-critical number crunching, because then the JIT, GC and so on are just waste of computing power.

In your benchmark, as count approaches max it will overflow and go negative.
At least it will in C where "int" on the Pi is a 32-bit signed integer.

This is undefined behavior - a big no no.
The compiler will assume it does not happen in a correct program, probably another reason to elide the entire thing ...............

As before, count += 10; count -=9 will not fool a compiler!!
Do something with a side effect such as print a result at the end that depends on count.
And reduce max a little to say 2147483627 which will avoid the disastrous overflow.

For other languages, once the code is JITed, I don't see why it couldn't be 2 instructions. JITing takes time, but in most applications it doesn't really matter that much... once the initial JITing is done, it can run fast enough to just JIT everything else during idle cycles.

Yes, agreed. But don't forget the interpreter itself must be loaded.
You are right that Python is a bad example, but I see the python interpreter is 1090 disk blocks - plus the users program, and nine libraries (from ldd).
The C compiled program is just 2 disk blocks (which is annoyingly large for such a tiny program).

For other languages, once the code is JITed, I don't see why it couldn't be 2 instructions. JITing takes time, but in most applications it doesn't really matter that much... once the initial JITing is done, it can run fast enough to just JIT everything else during idle cycles.

Yes, agreed. But don't forget the interpreter itself must be loaded.
You are right that Python is a bad example, but I see the python interpreter is 1090 disk blocks - plus the users program, and nine libraries (from ldd).
The C compiled program is just 2 disk blocks (which is annoyingly large for such a tiny program).

I didn't forget, and that's why I said "for most applications" Nowadays most machines have at least 8GB of memory, so that extra 50MB or so for the compiler (I mean, 50MB is already a lot tbh) won't be noticed by end users. Especially if the runtime is shared between several applications.

I have absolutely no idea why I'm biting on this. I'll use it as an excuse to exercise my brain. So while I'll write as though I'm speaking to you, I'm really more interested in just writing down my thoughts. I also stopped reading all the comments when I realized that it would be a REALLY REALLY long read. So I'm about 1/4 down on the second page as it stands now.

I've been many things in life. I have been coding far more than long enough to actually have earned a living in COBOL before it was obsolete. I spent many long hours programming in languages where GOTO statements were the only real option. I've been a senior developer on one of the fastest web browsers ever made for years. I've been a programming language designer, an operating system developer, an old school demo coder and for a very large part a codec developer for one of the companies that more or less dominated the patent pool on a lot of H.26X standards.

I am also currently coding almost entirely in C# for many reasons... though not really the ones mentioned. And remember, I've spent much of my life counting cycles.

I am happy to see the discussion between C# and Javascript as these are two of my favorite playgrounds. I'm friends with some of the people in charge of C# at Microsoft as well as some of the performance oriented developers at Microsoft, Mozilla and Google. I've had lunch with at least one person from each of their Javascript teams in the past two months. My son is actually named for one of them.

I'm also glad that no one is debating firmware/kernel level languages like C and C++. I'm completely over those languages and have absolutely no possible reason to recommend either for anything since Rust came around... except maybe that C's ABI is very simplistic and can be trivial to implement.

You're both right and you're both wrong about very much of what you say. We have a Javascript fanatic and a C# fanatic. I know this isn't fair, but if you go back and read what you've both been writing from a 3rd person perspective, you might consider medication for that. It was getting a bit out of hand.

Let me start by saying, Javascript is faster. There's absolutely no doubt about this and there is no debating it. Using a simple counter program isn't testing language performance. You're comparing the AOT and caching of the engine. It's not realistic and it's not fair. It also doesn't do what real programs do. Real programs allocate and free memory... that's the key thing.

There are some cases where C# will rip the doors off of Javascript. In .NET Core 2.1, there is now support for Span<> and Map<> which are two of the best additions ever made to a modern language. I have a personal request into the .NET development team to add support for memory locking and alignment so that I would no longer have to drop to C++ to manage memory for codecs.

From a language perspective, Javascript is amazing and you should not ever use modern Javascript. Browsers don't like it and it's best to stick with something older. Javascript in version 6 and later is an incredible language. The introduction of classes has been a huge improvement. That said, Javascript as a language is pretty weak overall as a comparison.

Let's start by saying that Javascript was never really supposed to stick. Technically, it's one of the best and the worst languages ever made. The simple fact that something like 'strict' even exists is proof of this. You have basically two languages in one. One language is a shit box that says that pretty much anything you type will do something. The other one is a language where "best practices" are trying to be declared and enforced. But the language is basically a sandbox where pretty much anything goes. There are days where we were sitting there and while writing a test for Javascript we said "wouldn't it be nice to have a new construct to make this easier for this test". Of course that keyword had absolutely no value to the language as a whole, but we said "screw it, let's stick it in" and before you knew it, the standards group over at Ecma wanted to kill us because not only did it become part of the language as a defacto standard, but the bugs in our implementation did too. We spent all our days and nights trying to fight with the crappy parser rules we had to write because of Netscape's original implementation and later Microsoft's half baked alternative that we honestly almost couldn't even call it a programming language parser as opposed to a natural language parser.

C# is a special language because unlike most other languages (Java as a notable example), the developers of that language are willing to rip things out and simply release a new version. This is possible because if you're using .NET there's no particular reason you can't have 5 different versions of .NET on your machine. It means that where Javascript has the 'strict' concept, C# actually makes breaking changes that force massive refactoring of the class libraries and things improve greatly. This is why C# probably has the nicest lambda and async implementations of any language today.

There are endless reasons why each language is better than the other. I can site that C# looks like a language carefully engineered by language developers who eventually added such insane static code analysis to their compilers that it's difficult to write bad C# code anymore since the compiler will go absolutely crazy and generate pages of warnings over it. Javascript has had probably collectively close to a billion dollars invested purely in a competition of performance which has yielded a runtime environment that made at least 3 separate Javascript engines that generally produce code that is far faster and/or better than any C compiler out there today (when considering real world memory usage).

C# doesn't ever try for the nitty gritty little things like fancy auto-vectorization engines. Javascript doesn't target it, but it's present. C# calls cleanly into native code, Javascript makes this really a gruesome task. As I said, pages and pages and pages of comments can be written on this topic. But in the end, when comparing CapEx (the time it takes to write and maintain code) vs OpEx (the cost of running that code) the two languages even out.

A good point to make as well is that poorly written Javascript in just the right place can actually be seriously detrimental to the environment. For example, if you were to implement a Javascript library which was compiled on client side for a web interface. Let's think in terms of something like Angular. Then a large download and a high complexity would consume massive amounts of power worldwide if it were used on something like Google's home page. Simply minimizing the Javascript on Google's home page probably saves the world massive power costs just because it would reduce the computational complexity (think Big-O) of the lexical analysis of the received code.

The reason I use C# however has to do with class libraries. In the Javascript world, even the simple things depend on code which is provided by the community. If you were you look at the Linux kernel and the absolute trash heap of code that is, (BTW, I love Linux, the code is horrifying though) it's an example of how the community can absolutely ruin the underlying structure of something otherwise beautiful. The Linux kernel has millions of lines of wasteful duplicated code because the kernel base libraries are insufficient for the job. The Linux world constantly creates, recreates and the recreates again the simplest of functions. The worst part is that there's no real central authority building solid tools for Javascript as a base library. We depend on things like NPM which provide toolkits from just too many places at once.

I hate coding in Javascript because the NPM toolbox is an total nightmare. To be honest, I almost never use community contributed libraries from Nuget. Nearly every time I've ever done so, I've regretted it. Instead, I use packages provided by vendors or roll it myself. The one exception to this has been in SSH support where I've been too busy to sit down and start a proper modern implementation of it.

I absolutely love the Microsoft stranglehold on C#. They have a good team working together to make a good product. There are days where I want to offer pull requests to them, but then I realize, filing an issue is more effective.

Let's just say again, you're both right and both wrong. There's no value to choosing one language/platform vs. the other if you're proficient in one or the other already. Unless you're padding your CV/resume it's best to simply program in whatever you like the most. They both have incredible merits.

If you compared either language to C or C++, I'd be totally different. Consider for instance modern meltdown/spector related CPU bugs. All the C/C++ code out there had to be recompiled and systems had to have scheduled downtown and there were reboots and verification testing and then additional patches to fix performance on those patches, etc...

If the code were written in either Javascript or C#, a simple patch to the browser or to the CLR would have altered the way the code was compiled and put an end to those problems. Instead we all took performance hits when our firmwares were updated.

Then there's relocatable memory. I despise any code where memory is allocated by pointer rather than by reference. Referenced memory can be paged, compressed, relocated, etc... this means that the system MMU can have much shorter tables to traverse when performing memory reads and writes since it is possible to use unaligned memory as well as to defragment and garbage collect on idle cycles. The kernel could run a process that did nothing more than to decrease the complexity of the GDT and LDT while idle. That said, we can do that to a limited extent, but because of "C purists", defragmentation on application and kernel level is impossible.

Then there's performance. Using a tracing JIT (we don't do that anymore, but we do something similar) it is possible to compile out conditions for specific branches of execution during runtime that can adjust to provide maximum performance whether running on an Atmel processor or on a 28 core Xeon. And then there's NUMA. When working in a multi-socket server or HPC environment, code written in C has to be manually written to place code near memory. Labs spend hundreds of millions of dollars trying to build servers and storage systems to compensate for C, C++ and CUDA code which doesn't properly distribute active data sets to active computational nodes. Instead, the software just depends on massive local RAM and a ridiculously high performance fabric to distribute data as RDMA operations. If the code were written in Javascript and/or C#, code could be properly distributed to where the data is on the fly which would be much faster than moving the data. It would be little more than simply migrating the process from machine to machine... a few bytes in operation. Then there's core scheduling and cache coherency... all problems compiled languages have that JIT languages don't.

I'm heading to lunch now. Thanks for the opportunity to rant for a while.

That is a quite good rant there, I enjoyed reading it
But before saying that C# is "undoubtedly" slower than JS... have you seen this? https://medium.com/@chrisdaviesgeek/net ... a8fd2edff0
The "traditional" .NET Framework is indeed slow. That's OK, because it was very GUI-oriented and that part it does well. But .NET Core, on the other hand... it doesn't have the GUI part and it's fast. In those benchmarks, it's faster than Express.JS or very close behind, except for "Fortunes", whatever it is. The author of that article claims it's faster than JS and he would use it, if not for the lack of 3rd party libraries... but honestly, I think it's gotten much better since 2016.

Just to throw some more random info not releated to .NET Core or Win10IoT... I wondered what cython might make of the random number generator test. And then I thought it might be more interesting to see how rust coped. So copy pasting the code from p2. I got

Just to throw some more random info not releated to .NET Core or Win10IoT... I wondered what cython might make of the random number generator test. And then I thought it might be more interesting to see how rust coped. So copy pasting the code from p2. I got

Please could you try this C version on your laptop that hasn't had the state variables made "volatile"! (which makes it about 2.5x slower ...)?
Also what version of the C compiler are you using?
It is a C program, so perhaps use gcc not g++ (though it may not make much difference).
Here is the original, closer to your rust version:-

...and you should not ever use modern Javascript. Browsers don't like it and it's best to stick with something older.

All the browsers I care about handle recent standards of JS just fine. Node JS handles them just fine.

Javascript in version 6 and later is an incredible language.

This in a direct contradiction of your statement above "should not ever use modern Javascript".

Which do you mean?

Javascript has had probably collectively close to a billion dollars invested purely in a competition of performance which has yielded a runtime environment that made at least 3 separate Javascript engines that generally produce code that is far faster and/or better than any C compiler out there today

I call BS on this statement. The performance of JS engines in recent times is very impressive but I challenge you to present an example of a JS program being faster than C or C++.

I hate coding in Javascript because the NPM toolbox is an total nightmare.

How so? I have been using npm and all kind of node modules for years. It has worked very nicely. Of course there are a lot of junk packages out there, hardly npm's fault, don't use those.

Consider for instance modern meltdown/spector related CPU bugs. All the C/C++ code out there had to be recompiled and systems had to have scheduled downtown and there were reboots and verification testing and then additional patches to fix performance on those patches, etc...

Yes, let's consider them... your statement is nonsense. meltdown/spector style problems are independent of any programming language. Indeed they were first demonstrated using Javascript!

Using a tracing JIT (we don't do that anymore, but we do something similar) it is possible to compile out conditions for specific branches of execution during runtime that can adjust to provide maximum performance whether running on an Atmel processor or on a 28 core Xeon.

I have never had an ATMEL processor where it was even possible to run C#, Java, JS etc.

Yes, that fixes it: 36s with C v. 45s Rust. gcc v g++ doesn't make any difference. Rust does have a fn rotate_left() but it it's basically identical code and the compiler automatically seems to optimize so #[inline] on rotl() has no effect. Also rotate_left() takes u32 argument so my usize was wrong for diy function. Seems to run ok on RPi either with rotate_left() or rotl(x: u32, k: u32) -> u32. It took 8m50s on my RPi2B (but gets the right answer) - thought it had died!

The unstripped size seems to be only a bit bigger for quite complicated projects, maybe executable size as well as compile time are targets for improvements. I think the main selling point for rust is that it's almost impossible to make memory leaks, race conditions or unsafe code.

also https://groups.google.com/forum/?hl=en-GB&fromgroups=#!forum/pi3d

I think the main selling point for rust is that it's almost impossible to make memory leaks, race conditions or unsafe code.

That combined with competitive performance.
By the look of it, it is doing overflow checks too.
Impressive speed considering its doing all that.

On ARM Rust's performance is worse compared to C than it was on Intel.
I speculate that the cost of overflow checks being much higher on ARM may possibly be the reason.
(ARM insns rarely set the overflow flag; importantly multiply does not, you have to do the check by hand which is slower).

@jahboater, yes, looking at the LLVM object code it has rather bizarrely converted one rotl() to rol but not the other! Comparing with the gcc version is a bit difficult but it does look to have possibly reduced the number of mov instructions, by doing things in a different order. In the tightest loop LLVM has 44 instructions c.f. gcc at 51 but it has a couple of conditional jumps, presumably for safe code checking, which the C code doesn't. Obviously the actual instructions used differ, which probably has impact on speed, I notice rust uses three imul whereas C uses two imul but one mul.

I see the ARM version does a "ror" (rotate right) instead of a rotate left!!

The Intel stuff is hard to predict for speed ...
Register "mov" instructions on Intel are usually free (they are eliminated).
Multiply is pretty fast - around 3 clocks or so. Test/cond jmp are fused into single operations, as now are
the likes of add/cond jump. Of course things like "sub r,r" or "xor r,r" have zero latency - never enter the pipeline.

So, the JS is about 6 times slower than the C version. This is a new hand made JS version. The previous JS results were JS compiled from C with Emscripten which performed much better at only about half the speed of C!

Interestingly they both use about the same memory when running:

C version - 4.7% (Of my 4 GB machine)
JS version - 4.9%

On the other hand the stripped C binary is 6 times bigger than the JS !

Here is the current code, slightly tweaked from previous versions so results may differ:

In Javascript all numbers are 64 bit floating point quantities. IEEE 754 and all that. Which is good because you have a huge real number range. Also if you are working in integers you can get precise integer values up to 53 bits (or whatever it is).

But that can cause problems. For example in my code I index and array with SIZE * SIZE / 3. Which is 10000 * 10000 / 3 in this case or 33333333.333333332. Well, you can't index an array with a non-integer value. One could use Math.floor or some such to round it to an integer.

But, it turns out that if you perform logical operations on numbers in JS the result is always truncated to a 32 bit signed integer. Which makes sense as logical ops are going to get done using integer instructions and JS comes from a time of 32 bit machines.

So, a shorthand and performant way to truncate to an int is to use a logical operation, for example "|0". Which does nothing but produce a 32 bit result.

Modern day JS engines optimize this kind of integer JS code very well. Which is why Emscripten can transpile C into JS and the result is only a factor of 2 or 3 slower.

This trick is especially important here in the random number generator which is designed to work on unsigned 32 bit integers. That "|0" ensures the result of the multiply by 9 does not exceed 32 bits and wraps around as it should.

Note also that I used ">>>" instead of ">>" to ensure a logical shift right, not arithmetic.

Not also my use of a JS typed array "new Int32Array(SIZE * SIZE)". This is a relatively new JS feature that also speeds things up and saves memory space.

I think you should have left rotl() as a function.

I was wondering about that. I had this idea that JS would not inline such a thing and the function call overhead would kill performance. So I pulled it straight in C on the way to creating the JS version.

If I have a minute I'll put rotl() back again and see what impact it has.

Thats very interesting, I did wonder how all that sort of thing was done.

FYI, NEON can do shifts on 64-bit integers and a "double" can be treated as an integer in NEON of course.
Which means that Javascript's << etc should be pretty fast. Also NEON is great at floating-point rounding and conversions and can likely do the "|0" thing in a single instruction.

It seems an understanding of the bits and bytes is still needed, even in a higher level language.

Just for you I put the rotl() back into both the C and JS versions I posted above (the post is updated).

Amazingly JS performance did not suffer very much at all from introducing that function call.

However, I also made the state variable array into a JS typed array. Which shaved 7 seconds off it's run time. So the code now is even faster!

I have no idea how well any of this works on a Pi. I don't have one to hand at the moment.

From what I understand of modern day JS engines they will try to optimize "hot" functions at run time. If a function only ever sees numbers for it's parameters it will get optimized for numbers. If they are 32 bit numbers it will get optimized to use integer arithmetic. And so on.

As such the "|0" should end up not even being compiled into any code at all. After all it does nothing for 32 bit ints. It is only a message to the compiler telling it that is the type you want.

It seems an understanding of the bits and bytes is still needed, even in a higher level language.

Yep.

At least in JS you know what your numbers are and how many bits they have and how operations will perform everywhere on any platform. As opposed to languages like C and C++ where it is a lottery where so many things are "implementation" defined.

Then there is the whole floating point fiasco. To quote the common example in JS: