Slashdot videos: Now with more Slashdot!

View

Discuss

Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

sheepweevil writes "IBM just releasedMilepost GCC, 'the world's first open source machine learning compiler.' The compiler analyses the software and determines which code optimizations will be most effective during compilation using machine learning techniques. Experiments carried out with the compiler achieved an average 18% performance improvement. The compiler is expected to significantly reduce time-to-market of new software, because lengthy manual optimization can now be carried out by the compiler. A new code tuning website has been launched to coincide with the compiler release. The website features collaborative performance tuning and sharing of interesting optimization cases."

I would argue. Either you get it without needing to watch the movie simply by being surrounded by people who in their day to day existence simply follow instructions blindly "because they have to" or you won't get it - whether you watch the movie or not.

As for the GP, I believe he is mixing two things incorrectly:

fail to see how automation leads to lower IQ scores
and
lead to a lesser society

Lower IQ scores don't immediately mean a lesser society, but if you take the thinking out of a process and let a process/machine/program do all the thinking, your mind will inevitably get lazy a

if you take the thinking out of a process and let a process/machine/program do all the thinking, your mind will inevitably get lazy and your work will suffer over time

I think that it could very well free your mind to think about better things. Build systems are a good example. If I had to manually compile each translation unit, I couldn't spend as much time thinking about the code.

Most programmers stopped "thinking about the code" long ago and just slap together a bunch of libraries. That's why my DVD/HD player takes about 30 seconds loading operating systems (I hope that last 's' is an exaggeration, but I fear it may even be correct) before it will even eject a disc.
As for the supposedly huge performance improvement of 18% (that's all?!), I have regularly hand-optimised code that ran more than twice as fast.

As for the supposedly huge performance improvement of 18% (that's all?!), I have regularly hand-optimised code that ran more than twice as fast.

True. I've read that a hand optimization of less than 50% sometimes isn't worth a developer's time, because users won't really notice it. (Obviously that doesn't apply to situations were a ton of small optimizations are needed, and the application will speed up over time.)

Abstraction is one of the foundations of higher thinking. There is something to be said for being able to do lower-level tasks, but you don't concern yourself with the internals of them when you want to treat them as discrete objects. Nobody thinks about the construction of an AND gate when they're designing something that uses AND gates. Nobody thinks about the internal workings of a method or function when they simply want to call it. In every area, the process is the same: You first learn the basic comp

I said "simply want to call a function". If you're debugging, then you look at it. If you're designing code, you just use the function. The very existence of a function instead of endlessly repeated code is an example of the principle of abstraction.

You probably aren't as good at mental arithmetic as someone who's had to do math without calculators though. The question is whether that is a problem. I think the non sequitur here is the subtle implication that not possessing certain skillsets, such as mental arithmetic, would lead to humans becoming lazy and eventually the downfall of society.

I'd say that by examining the average person's mastery of stabbing a sharp stick into a neighbourhood critter and then making food out of it vs. the "lesserness" of

That doesn't follow either; repeatedly performing some task familiarizes your brain to that particular task. While performance in similar tasks (say, remembering people's zip codes for someone who have had to remember phone numbers as in your example) might be improved it doesn't necessarily make one more creative in general.

Joke aside, as for any rather absolute statement, there is only a part of truth both in the statement and in its opposite.
Granted, automation does not necessarily lead to a lesser [level of intelligence in] society.
However, not everything that automation can do must necessarily be done by automated means.
For instance, calculators have automated calculus. I for one, welcome our key-laden overlords from arctangent, because I rarely have to compute arctangent and find it handy to have a calculator to do

The main tenets in Idiocracy were that IQ is hereditary and those with less IQ spend more time procreating. Automation was merely allowing their society to function, barely. IOW, I don't see your point. Can you elaborate, please?

The compiler is expected to significantly reduce time-to-market of new software, because lengthy manual optimization can now be carried out by the compiler.

Oh, so new software takes too long to build because of lengthy manual optimization? That's news indeed. Even if it did, will the compiler find a better polygon intersection algorithm for me? Will it write a spatial hash? Will it find places when I am calculating something in a tight loop and move the code somewhere higher?

will the compiler find a better polygon intersection algorithm for me? Will it write a spatial hash? Will it find places when I am calculating something in a tight loop and move the code somewhere higher?

The real question in everybody's mind is: will it blend? [willitblend.com]

Highly optimized software does take a long time to build because of manual optimization. Plus, if anything changes, that optimization might need to be done again. And yes, a good compiler will move that loop for you.

That's news indeed. Even if it did, will the compiler find a better polygon intersection algorithm for me? Will it write a spatial hash?

I TA'ed a course called Contract-based Programming (which was about Hoare triples and JML, a java extension which does checking of pre-/postconditions and invariants).

I noted that the lecturer had a book on his shelves titled "Algorithm recognition". I speculate that it might talk, for instance, about how to recognize bubble sort and replace it with quicksort. Or how sorted(list)[0] might be replaced by min(list), or how sorted(list)[4] might be replaced by quickselect(list, 4).

No, this 'learning' compiler only learns how to optimally translate C++ statements to machine level operations. It cannot choose high level algorithms for you. And the reason that such a learning compiler is useful is not to help lazy application programmers, but because developing new, optimised compilers for the many different processors and platforms out there (think computers, mobile phones, embedded systems, etc) is time consuming.

While the the summary is wrong on this subject, I can tell you that, yes, manual optimization is part of our work and can slow down the release of our product. If we told a customer that yes, we will be able to do VGA 30FPS H.264 encode. Code optimization on our custom core is going to take some time and effort. I work in the embedded multimedia field.

Will it find places when I am calculating something in a tight loop and move the code somewhere higher?

Quite likely, yes.

Even a dumb optimizer will love loop invariant code outside of a loop, and maybe partially unroll the loop to make the looping overhead less. The latest gcc will even automatically vectorize the loop for you to execute a number of iterations in parallel using SSE/etc instructions if it's a suitable candidate.

The compiler is expected to significantly reduce time-to-market of new software, because lengthy manual optimization can now be carried out by the compiler.

How about this: The coders take the time they would have used to "optimize" and instead better document, test, and debug the code. Instead of same quality, less money, make it better quality, same money? You know that the developer isn't going to charge less money for a new product because it took them less time to get it out the door.

Absolutely. This is not for the normal processors we all know and love, nor is it any good for javascript or python etc.
Compilers for C++, C#, java etc. on normal CPUs all have pretty ferocious optimizers already. Not that an attentive human programmer can't make much more of a difference, usually.

No, lazy editors. The submission is what the submission is. It is up to the editors to select meaningful submissions that accurately reflect the story. Any failure of the submission to do that is a failure on the part of the editorial staff, be it through laziness or incompetence.

I'm not a programmer at all, but have dabbled in a few different languages, as I find programming very interesting. (Got pretty good at mirc scripting when I was younger, which lead to visual basic, C++, and now C# dballing that nvr leads to anything). This said, I have a basic knowledge of programming in general. My question is, What things can a compiler do to your code to 'optimize' it for you? I would think majority of any good optimizations might require rethinking whole methods of doing things and/or

In brief, some of the more common ones are things like substituting known values for expressions (e.g. x = 3; y = x + 2; can be changed to x = 3; y = 5;), moving code that doesn't do anything when run repeatedly outside a loop, and architecture-specific optimizations like code scheduling and register allocation. (E.g. with no -O parameters, or -O0, for something like "y = x; z = x;" GCC will generate code that loads "x" from memory twice, once for each statement. With optimization, it will load it once and store it in a register for both instructions.)

If the compiler tries to do this, wouldn't it likely screw your code up?

There are cases where optimizations will screw something up. One example is as follows. It's considered good security practice to zero out memory that held sensitive information (e.g. passwords or cryptographic keys) to limit the lifetime of that data. So you might see something like "password = input(); check(password); zero_memory(password); delete password;". But the compiler might see that zero_memory writes into password, but those values are never read. Why write something if you never need it? So it would remove the zero_memory call as it's useless code that can't affect anything. So it removes it. And your program no longer clears the sensitive memory.

I see, thanks for the detailed explanation, I hadn't thought of those. I like the first example, and the fact that the compiler can recognize your code and make that replacement confidently. Thats just cool.

I see, thanks for the detailed explanation, I hadn't thought of those. I like the first example, and the fact that the compiler can recognize your code and make that replacement confidently. Thats just cool.

Any compiler from the late 80ties will do that, but that is not really a safe optimization. If your optimization is safe, optimized and unoptimized versions should produce the same output, even if they calculate it in different ways. However in the given example there is no guarantee when (non-optimized) y=x+2 is executed, x will still be 3 (and y will still be assigned 5.) Optimized version always assigns 5. So the optimized and unoptimized versions might do different things. Even if two assignments follo

There are cases where optimizations will screw something up. One example is as follows. It's considered good security practice to zero out memory that held sensitive information (e.g. passwords or cryptographic keys) to limit the lifetime of that data. So you might see something like "password = input(); check(password); zero_memory(password); delete password;". But the compiler might see that zero_memory writes into password, but those values are never read. Why write something if you never need it? So it would remove the zero_memory call as it's useless code that can't affect anything. So it removes it. And your program no longer clears the sensitive memory.

And it was the crypto library's fault, not the compiler's fault. Most languages worthy of doing crypto programming in have a facility to say ~"don't optimize this". An example: in C, the keyword "volatile" instructs the compiler that the field may be changed at any time, and thus all reads/writes must take place and must do so atomically [unfortunately, the C spec doesn't specify "in order" for volatile fields, but I digress].

Another one comes up in embedded programming. Optimizing compilers assume that a variable that hasn't been written to inside a loop won't change inside that loop, so its evaluation is moved outside the loop to optimize for speed. But if that variable is changed by an external influence (an interrupt service routine, timer, input pin status, etc.) the optimized code will never see it. For example: while (DataNotReady) delay();/* yes, there are better ways but this saves code space in memory-limited microcon

it optimizes the translation of to assembly opcodes. When you code the stuff you type is not in the binary that's compiled/assembled/linked.

I highly recommend you add a tiny amount of assembly programming dabbling to that list, and you will gain better understanding of how compiler optimization is not a simple affair. There are many ways to do the same thing.

As for an example of a basic optimization method, removing dead code, code that is in there but never called by the main method.

Another one is vector optimization, where certain routines or parts of routines where it's suitable use the vector units of a cpu to speed things up a little.

Again, didnt even cross my mind. thanks alot.
I would also be interested in taking a look at assembly, but all I hear about it is that is very hard to grasp, and not necessary with all the languages out there. But I will deff take a look into it to get a better idea.

in regards to learning assembly, if you run linux, the best book I can recommend is Programming from the ground up [gnu.org] it's licensed under the GNU free documentation license, and in my honest opinion is likely the single best book for anyone who has no idea that wants to start, I already had some clue so skipped the first two thirds of the book, but read it for shits and giggles later and found it to be a very easy to grasp book.

To this day if I forget minor details about things I pick that back up and re-read it a bit:)

Assembly itself is not "hard". The language itself is simple. I'd argue that most of the "hardness" is due to its simplicity. There is almost none of the abstract structures and methods that high level languages provide you, and even for something something as "simple" as calling a function, you'll have to manually push data on a stack, jump to the new location, and then pop back the data afterwards, etc.

Might be unnecessary for those programmers who has no interest in understanding how the computer actually works, but it's worth a look.

Disclaimer: I've never really done any assembly programming, but only "dabbled" in it for a bit a few years ago.

I agree that it's a good idea to learn assembly / machine language to understand what a compiler is doing, but learning the assembly language of the computer you use at home is not as reasonable a suggestion today as it once was. Learning to code to a 6802 wasn't bad; it only has a few instructions, and it's very instructive (and fun) to find out how many things you can do with just those. I think trying to write for your home PC in assembly is now beyond a beginner exercise though.

The BBC Micro came with an in-line assembler in the BASIC that shipped with the machine. The manual that came with it had a full reference for BASIC and 6502 assembler. It was a great machine for learning about computers ; lots of languages available, BASIC and assembler out of the box, and so many and varied I/O ports it was a hardware hackers dream as well. I remember the first time I patched in a routine that made the on board sound generate "key ticks" for each keystroke and being thrilled.

"Anybody out there know a good emulator for teaching assembly programming?"

CorePy (www.corepy.org), while not an emulator, is probably the easiest way to learn assembly. It's a complete environment for assembly-level programming using Python and supports all the major platforms (x86[_64]/SSE, PPC/VMX, Cell SPU, ATI GPUs).

Instead of using inline assembly, CorePy represents all assembly instructions as Python objects, leading to a very natural syntax and also enabling some really interesting methods for gene

Replace a mod (e.g. x % 32) with a bitwise-and (e.g. x & 31) when the divisor is a power of two.

Another very similar one, and one that comes up more commonly, is the replacement of a multiplication or division by a constant by a series of additions, subtractions, and bitshifts.

For instance, "x/4" is the same as "x>>2", but the division at one point in time (and still with some compilers and no optimization) would produce code that ran slower. Some people still make this optimization by hand, but I

Another very similar one, and one that comes up more commonly, is the replacement of a multiplication or division by a constant by a series of additions, subtractions, and bitshifts.

ARGH! Mod parent down! Please, please, please don't ever repeat this again to people asking things about optimisation. On most modern computers, shifts are slow. They are often even microcoded as multiplications, because they are incredibly rare in code outside segments where someone has decided to 'optimise'. Even when they're not, a typical CPU has more multiply units than shift units and the extra operations needed from the shift and add sequence bloat i-cache usage and cause pipeline stalls by addi

No that's not true. A shift instruction has a one cycle latency and 1/2 cycle throughput on the Core2 / Core2-Duo. An add instruction also has a one cycle latency and 1/3 cycle throughput on the Core2-Duo.

The integer multiplier on the Core2-Duo has a 4-cycle throughput and an 8-cycle latency. So in a "simple" case like x*9 = (x<<3)+x the optimisation would take 2 cycles, and the straight mul would take 8. In more complex cases the individual shifts will pipeline for more of a benefit. Only in cases wh

Note that microbenchmarks here don't tell the whole story, because the increase in cache churn, register pressure, and inter-instruction dependencies also slow things down. When you issue a set of shift and add instructions, each one has to complete, in order, before the next can start. With a multiply, this can be overlapped with load and store operations. A well-designed microbenchmark will show this to some degree, but in code where the multiply is close to other instructions it becomes even more obv

This is absolutely and completely wrong. The dependencies between the shifts and adds mean that you get absolutely no benefit from pipelining; each one has to complete before the next can be issued. Worse, you cause a pipeline stall because you've just filled up the CPU's re-order buffer with shifts and adds and so you don't get any benefit from out-of-order execution either.

You've misunderstood what I said - there is a benefit from independent shifts being pipelined. So consider the case where I want to co

At which point we're talking about inline assembly at least, not a "simple" in-compiled-language optimization, e.g. C mul expr -> C shift/add expr, which is what the entire preceding thread has been talking about. And if you're talking about writing code in assembly, that also has no place in a thread about *compiler* optimizations.:)

Never mind that you're now using another register, which depending on the specific circumstances, may be a bad thing, e.g. the compiler might find a better use for that re

What you say is true for inlining low-level assembly optimisations. But that wasn't actually my point. I'm not writing code in a "high" level language like C with inline fragments - I'm writing a code generator for a compiler. So everything that you say about the compiler knowing the architecture better than the programmer applies. But I'm checking those assumptions to tune how the backend generates code.

The example that I mentioned comes up when writing multi-precision multiplication routines like the low-

Yea, I had a feeling what you were talking about was either in asm work itself, or most likely an optimization done *within* some kind of compiler or special-purpose context.

If I had looked at the posts below yours before responding to yours, I would have realised you weren't the only one to go "offtopic", strictly speaking. Had I noticed you weren't the only one to start drifting, I probably wouldn't have bothered to say anything.

My comment was really aimed at readers like the OP, so they knew this was no

On most modern computers, shifts are slow. They are often even microcoded as multiplications.

You're right to say that recoding a multiplication as a combination of adds and shifts is likely to be a loss since multiplication is so fast, and since the extra instruction fetches (memory accesses, decode overhead) are going to kill it!

However, you're wrong about shifts being slow. Ever since the early days shifts are implemented by a "barrel shifter" can can shift by an arbitrary N bits in a single clock cycle.

I assume you mean on x86 architecture. There are architectures where it's faster to do a shift, ARM being a very popular one in number of cores sold. On ARM, operands pass through a barrel shifter that allows them to be shifted almost any which way during instruction execution.

Thus, a lone shift operation actually wastes time on ARM because it's translated to a move instruction. But a shift+add can be done in one instruction (rather than 2) because the shift is done

Before you get flamed to death by some idiot, you've got realise that compilers translate a higher-level language into a lower-level one, typically into machine instructions (or in the case of Java and.NET, virtual machine instructions), turning source code into executable form. Interpreters on the other hand, execute each statement of the language directly (effectively forming a virtual machine for that language).

Naive compiler translations can be functionally correct but sub-optimal with respect to runtime performance, memory/disk footprint etc. Compiler optimisation is the effort to make this translation as optimal as possible with respect to some variable(s) e.g. performance, size

What you are thinking of sound like source code optimization. There are various interpretations of this but to my mind, this means a combination of optimal algorithm selection and optimal algorithm implementation. Note that complex algorithms can be decomposed into smaller common algorithms e.g. a sort routine may be part of some higher-level algorithm, the sort-routine may be optimised independently of the higher-level routine.

Google Web Toolkit contains a Java to Javascript compiler. It is the automated translation by a program of one computer language into another, just like translating C to assembly language, or assembly language to machine code, or Java to Java Bytecode, or Java Bytecode to machine code. All of the programs which do those translations are "compilers"[0]. A program which takes Java and spits out Javascript is no different. It's just another compiler, albeit with a very unusual target.

The correct answer to this question is... it depends. No matter how advanced your compiler is it can't select the correct algorithm for you. If you're ordering your lists with a bubble sort instead of some kind of btree, there's nothing the compiler can do to help you except deliver the best O(n^2) sort it can. A truly artistic programmer can transcend all of the optimizations this compiler might achieve, by several orders of magnitude.

As a trivial example, one of the benchmarks I use for testing my Smalltalk compiler is a naive calculation of the Fibonacci sequence. With my first version, which did very little optimisation, it was much slower than GCC-compiled Objective-C (it's now about 50% slower). If, however, I compared a naive implementation in Objective-C to a more intelligent (O(n)) implementation in Smalltalk, the Smalltalk implementation was faster for all n greater than 30, and when you got closer to 40 it was several orders

Humans inevitably die anyway so there is no point in slowing down the code to prevent it.

In fact, think of how much of an optimization that is! I mean, suppose people were killed by our robot overloads at 25. That's 1/3 of 75 years old; that's a 3x improvement in the speed we go through our life! In a world where a 20% improvement in speed for a new optimization is very impressive, 3x is just great!

It seems like You're computing a spatial Hash! Would You like to use the fastest subroutine I know or use your own?

seriously... this post talk about machine learning optimization, will it be like "more stuff You compile, better luck with resulting machine code" ?

It's like a new GPS navigation software thats not only capable of route optimization but also capable of destination suggestions. "It sounds like You're going to a grocery store to buy pizza... there's a pizza hut round the corner!"

So if I'd compile a Linux from scratch with this new compiler, everything speeds up by 18% on average? That would be quite impressive, and possibly the best justification for Gentoo. Might be nice for my aging notebook...

>The compiler is expected to significantly reduce time-to-market of new software,>because lengthy manual optimization can now be carried out by the compiler.

The time to *make a new compiler* for a certain processor is reduced, and theprocess of figuring which optimizations are should be in the compiler for that architectureis automated.

This is for the kind of research where they attempt to make many specialized processorson a single chip instead of a general monolithic one. In this case, you need manycompilers and tuning those is important. It's the time optimizing THOSE that is lowered,not the one of writing the software that is compiled itself.

I see no real relevance to the "normal" desktop situation on that website.

How do you know? It seems entirely plausible to me that significantly better compiler optimization could reduce the level of manual optimization needed for embedded systems, and thus reduce the time-to-market.

The *summary* is bollocks because it doesn't mention the "embedded" part, nor does it mention that this is app seems to be really for the compiler *authors* not the compiler *users*. This is from their PDF doc:

Using MILEPOST GCC, after a few weeks training, we were able to learn a model that automatically improves the execution time of MiBench benchmark by 11% demonstrating the use of our machine learning based compiler.

A few WEEKS of training!?!?

More than likely, this could be used by the GCC folks to figure out the best default set of opts for a given -Ox level on a given arch by running it over some representative set of real-world code to find the best set of opts.

I always thought that testing and debugging were the lengthy manual steps

Not if you wrote the code well!;-)

Seriously, as someone who's been doing this a long time (since '78, professionally since '82), and who is still at the top of his game, I nowadays spend *very* little time on debugging since it works first time - even the complicated multi-threaded, mutex type of stuff which is what I primarily write nowadays. After a while you stop making mistakes!

They kept hooking hardware into him--decision-action boxes to let him boss other computers, bank on bank of additional memories, more banks of associational neural nets, another tubful of twelve-digit random numbers, a greatly augmented temporary memory. Human brain has around ten-to-the-tenth neurons. By third year Mike had better than one and a half times that number of neuristors. And woke up.

Some logics get nervous breakdowns. Overloaded phone system behaves like frighte