If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

So when RAID-4/5/6/etc subsystem uses SSE to calculate Q syndrome, this means that whole that time on that core interrupts are disabled ?

IIRC x86 has some trick that enables it to save extr<a context only when needed. Something about some flag in special registers so that taks that tries to use SSE faults and then FW code saves registers, marks SSE registers to be saved on next context switch and continues.

Coludn't such trick work also within kernel ?

You would need to implement it and prove to Linus that it is better to use it. The general consensus is that such a thing is less efficient.

With that said, all in-kernel SSE is done with interrupts disabled. The registers are saved and restored by the critical section. This is by no means exclusive to Linux. Other kernels do it too. Illumos and FreeBSD come to mind here. The Linux kernel functions for doing this are kernel_fpu_begin()/kernel_fpu_end().

You would need to implement it and prove to Linus that it is better to use it. The general consensus is that such a thing is less efficient.

With that said, all in-kernel SSE is done with interrupts disabled. The registers are saved and restored by the critical section. This is by no means exclusive to Linux. Other kernels do it too. Illumos and FreeBSD come to mind here. The Linux kernel functions for doing this are kernel_fpu_begin()/kernel_fpu_end().

It turns out this is not strictly correct. Older Linux kernels use the ts bit on x86 to try to avoid reloads as much as possible:

It is still basically what I described earlier though. The kernel simply does not need floating point arithmetic. Sometimes, vector instructions are useful, but in those times, we use critical sections. Anyway, few kernel developers ever use these functions.

Just because hdds are so big doesnt mean they should be filled up. The fact is that storage is the slowest part of most computers. You have a choice between waiting for storage or getting your work done.

I also use Windows 8.1 for gaming and Mac OS X and can sure tell you that the rest of the world doesn't care. Mac apps are huge. Windows games are huge. The binaries are huge. Nothing is optimized well. Windows updater for Java doesn't delete old versions and the C: drive can be filled with 50 versions of Java 5 and 60 versions of Java 6. Nobody cares. On top of that they constantly run antivirus software which slows down the machine.

I think this is being missed in all the fuss. This is such a basic and simple step in getting software from 'hobby code' to 'production worthy'. Given the scope of the changes, everything needs to be fully tested, and changes on this scope and potential complexity should be fully justified with compelling arguments for rather than vague and rather hand-wavey "it does stuff quicker/smaller".

I'd say the LTO is broken if it makes changes that affect program semantics. Maybe C / C++ are bad languages, maybe it's the LTO algorithms. Anyways, these kind of optimizations need to be safe to be useful. Another thing I don't get is, if the LTO support is just about modifying some Makefiles, why not support it partially if it doesn't conflict with non-LTO builds? You could still allow LTO while not actively supporting it.

To all the people that have to avoid sugar and on periodic insuline shots (aka "diabetics") I say that I just downed large Coke over enormous piece of pie. And now I feel just fine.

Also, I could never understand all this talk about contraception.

I never wore or used protection and still managed to walk away without getting pregnant. Which would in my case automatically mean caesarian section, since my penis would pose impossible bottleneck.

Has it ever crosed your mind that we don't run the same harware in comparable circumstances ?

And BTW, just because you haven't seen any problems _yet_ , it doesn't mean that they are not lurking hidden.
That "hey, it works!" initial stage was my experience, too. And then problems kept cropping, so after much headscratching I went into "There are problems, but I'm good with workarounds". And then I figured out that I've just put many hours into it with new problems keep popping up and many things not working in consistant ways.
So, after more than a year, I ended it in "F**k it, it simply aint worth it."

Give it time.

That's a ridiculous comparison...

You weren't using a recent enough toolchain. I said it isn't experimental ANYMORE. If you're using binutils 2.19 and GCC 4.7, you're not going to have as great of an experience with LTO as you would with binutils 2.23 and GCC 4.8.

I've been using LTO like this for... jeez, 5 months or so now, with at least a new build every week, used by roughly 1000 other people. No problems have arose from LTO, and if they do after 5 months, it can't be extremely significant and it's not like it's impossible to fix bugs.

You weren't using a recent enough toolchain. I said it isn't experimental ANYMORE. If you're using binutils 2.19 and GCC 4.7, you're not going to have as great of an experience with LTO as you would with binutils 2.23 and GCC 4.8.

I've been using LTO like this for... jeez, 5 months or so now, with at least a new build every week, used by roughly 1000 other people. No problems have arose from LTO, and if they do after 5 months, it can't be extremely significant and it's not like it's impossible to fix bugs.

I've used freshest gcc I could find. Last few gcc bumps in gentoo were mine, since I couldn't wait for an official ebuild. I recompiled everything so many times.

And even with gcc-4.8.2 and freshest binutils, my list of problematic packages was quite long and not shrinking much.

Worse yet, what compiled with LTO was not always repeatable. Some packages that compiled initially, all of the sudden would fail to recompile after some time. It had something to do with the sequence in which it was complied, copared to its dependencies, and much of that effect was RECURSIVE.

In theory, LTO is just great and practically functionally equal to the classic way.

In reality, compile would often fail at the final link where either:

- linker would spew out crap that even Google has never seen

- it would say scary things like that it has XY different definitions for function W

- that it can't find function W to link to

- that function W it is seeking for and the one found are not compatible

And WRT to your proof by numbers of WHOLE 1000 users for 5 months, it's pathetic.

Look at OpenSSL and heatbleed bug. How many people were using it ? For how many years ?

I've used freshest gcc I could find. Last few gcc bumps in gentoo were mine, since I couldn't wait for an official ebuild. I recompiled everything so many times.

And even with gcc-4.8.2 and freshest binutils, my list of problematic packages was quite long and not shrinking much.

Worse yet, what compiled with LTO was not always repeatable. Some packages that compiled initially, all of the sudden would fail to recompile after some time. It had something to do with the sequence in which it was complied, copared to its dependencies, and much of that effect was RECURSIVE.

In theory, LTO is just great and practically functionally equal to the classic way.

In reality, compile would often fail at the final link where either:

- linker would spew out crap that even Google has never seen

- it would say scary things like that it has XY different definitions for function W

- that it can't find function W to link to

- that function W it is seeking for and the one found are not compatible

And WRT to your proof by numbers of WHOLE 1000 users for 5 months, it's pathetic.

Look at OpenSSL and heatbleed bug. How many people were using it ? For how many years ?

Well I suppose the issue is more that LTO isn't something meant for using everywhere on a system where things are updated and changed constantly. For a case like that it's probably best to use it for individual things rather than on big dependencies. With something like Android everything is just compiled all at once and isn't updated in pieces etc.
For the kernel though (what this thread was concerning) I don't see why there's any reason not to support LTO. Even if the gains aren't extreme, if we turned away every patch that only offered a 3% speed improvement or 4% size reduction we'd have an extremely slow and stagnated kernel in my opinion. Those 3 and 4 percent gains add up pretty fast.

WRT Proof: http://forum.xda-developers.com/gala...-2014-t2427087
I don't like bragging or boasting, the only reason I brought up the numbers there was because it was relevant to my case. There are even builds that have accumulated nearly 2000 downloads, however as of late the download numbers have dropped as the amount of time I've had for FML has dropped too, but I don't care, I actually have a good attitude towards it: "If they aren't happy with FML, I would prefer they give other ROMs a shot until they find something they enjoy, and/or give me some feedback as to what I could improve upon in FML."

Well I suppose the issue is more that LTO isn't something meant for using everywhere on a system where things are updated and changed constantly.

Nope. That's exactly the environment that it should do best in - where there are many small things getting compiled into many bigger and those bigger being recomplied now and then. Since LTO can see inside at least fat binaries, it should be able to do its magic.

For a case like that it's probably best to use it for individual things rather than on big dependencies. With something like Android everything is just compiled all at once and isn't updated in pieces etc.

My point was that if this things crop up in such cases, there are serious bugs and inconsistencies in implementation and at least for me, solution is not in using LTO in a way that I simply won't see them.
For my own playing that _might_ be acceptable, but not for something I intend to put in public use or give to customers.

For the kernel though (what this thread was concerning) I don't see why there's any reason not to support LTO. Even if the gains aren't extreme, if we turned away every patch that only offered a 3% speed improvement or 4% size reduction we'd have an extremely slow and stagnated kernel in my opinion. Those 3 and 4 percent gains add up pretty fast.

And WEEKS of lost time, mopping the cr*p afteer each such gain have lost me half of lifetime already. And it is adding up infinitely faster.
Doing it for testing is fine, but using this in kernel for 1% gain on some irellevant test for 99.999% of the planet while risking for the thing to ge berserk and make fruit salad out of one's disk/s/ is IMHO moronic.

1. LTO was experimented with before gcc-4.6, it was introduced with 4.6 and by first 4.7 it was declared as pretty complete, practically ready.

From IIRC 4.6.3 to 4.8.2 I compiled each now revison of gcc and binutils in hope that things will get better. I can't remember one problem that I have solved with this. Even those _few_ packages that I managed to recompile with LTO have bitten me in the ar*e later.

2. Have you noticed that even now even here no one actually knows even loosely WTF this is and how it is supposed to work ? It is all more or less anecdotal, at least out of the circles of compiler developers.
Same with flag and their effects. I was told first that LDFLAGS have to contain CFLAGS, so that linker uses same optimisations as compiler did for optimal end effect.
Then someone came along and told me this is totally wrong since one should NOT pass compiler flags to linker and that linker will get the info about the flags used from the object files themselves.

Now hovewer is another day and I am told here that this is wrong and first version was correct all along.
And tommorow is another day, so who knows ?

Sometimes I get the feeling that for this crowd journey (= fiddling with compiling) is more important than the journey (= getting the code of desired quality).