Perhaps we should really also address #10120, as more systems than originally reported seem to be affected, i.e. reduce (perhaps partially) optimization to -O1 to work around obvious bugs in GCC 4.4.1 on these platforms.

Did someone report this to the PARI guys? Perhaps they could provide a patch such that we don't have to maintain it (that selectively changes the compiler flags for only some files).

Unfortunately(?), not all people building on e.g. openSUSE 11.2 run into these problems, apparently.

Perhaps we should really also address #10120, as more systems than originally reported seem to be affected, i.e. reduce (perhaps partially) optimization to -O1 to work around obvious bugs in GCC 4.4.1 on these platforms.

Here's an idea: we first try to build with -O3 and when that doesn't work, fall back to -O2, then -O1, then -O0.

This way we don't have to find out exactly which versions of gcc are broken.

I think reporting this to PARI is pointless, because they can't help (and probably won't care about) a broken gcc.

Perhaps we should really also address #10120, as more systems than originally reported seem to be affected, i.e. reduce (perhaps partially) optimization to -O1 to work around obvious bugs in GCC 4.4.1 on these platforms.

Here's an idea: we first try to build with -O3 and when that doesn't work, fall back to -O2, then -O1, then -O0.

"For reference: OpenSuse? 11.2 (gcc (SUSE Linux) 4.4.1 [gcc-4_4-branch revision 150839]) has the same problem when building PARI: on a machine with 64GB of RAM, it eventually fails after all memory is exhausted (takes hours). [...]"

So I don't think that's the way to go. (Other machines might start swapping, which effectively "freezes" some systems.)

Or should we do something like

(ulimit -St 900; $MAKE)# Which value is appropriate?

?

I think reporting this to PARI is pointless, because they can't help (and probably won't care about) a broken gcc.

They at least perhaps have better experience which files are most likely to trigger failures due to GCC bugs.

If we do "trial building" with some limit(s), we should also make sure that the build actually failed due to a resource limit before retrying with less optimization, e.g. check that the exit code was 152 (SIGXCPU + 128) if we use a CPU time limit.

If we do "trial building" with some limit(s), we should also make sure that the build actually failed due to a resource limit before retrying with less optimization, e.g. check that the exit code was 152 (SIGXCPU + 128) if we use a CPU time limit.

With ulimit -v I receive SIGKILL on exhausted memory, which isn't very specific...

If we do "trial building" with some limit(s), we should also make sure that the build actually failed due to a resource limit before retrying with less optimization

The build could fail for many various reasons, including but not limited to allocating too much memory. There are various other tickets where a PARI build fails because of a broken gcc. All these should be caught, not only the cases where we run out of memory.

If we do "trial building" with some limit(s), we should also make sure that the build actually failed due to a resource limit before retrying with less optimization

The build could fail for many various reasons, including but not limited to allocating too much memory. There are various other tickets where a PARI build fails because of a broken gcc. All these should be caught, not only the cases where we run out of memory.

Of course.

I wonder if we then would get PARI build errors due to GCC bugs reported any longer... ;-)

I wonder if we really need the make install-doc* patch (TeX usage) since apparently all errors are ignored... ;-)

I'd like to have ticket references also in SPKG.txt (Changelog).

Trial building with -O3...-O0 won't work if initial_CFLAGS already contain some (higher) optimization level (which I think isn't unlikely), since $optflag gets prepended.
I would start with CFLAGS as is, and then append -O2 etc. in case the previous build failed. Also, we IMHO shouldn't retry if configure failed. (We could simply keep the exit in these cases.)

I'm not sure if all platforms support ulimit -v, although Linuces (where GCC bugs showed up) certainly do. Perhaps we should test its exit status (and skip trial building if the platform does not).

test ... -ne ... (etc.) is for numerical comparison, not for comparing strings.

I essentially agree with your comments, *except* for not retrying when Configure fails. It could very well be that some gcc bug causes Configure to fail and we should also catch that.

You're probably right that ulimit -v doesn't work everywhere (on OS X 10.4, the command succeeds but doesn't actually limit anything), but I don't think that's an issue. If it doesn't work, so be it...

I essentially agree with your comments, *except* for not retrying when Configure fails. It could very well be that some gcc bug causes Configure to fail and we should also catch that.

How / when would changing the -O level solve Configure errors? I can't imagine such, at least not with gcc...

You're probably right that ulimit -v doesn't work everywhere (on OS X 10.4, the command succeeds but doesn't actually limit anything), but I don't think that's an issue. If it doesn't work, so be it...

:-) Never mind, though we could also set some CPU time limit.

Should we report the broken parallel make install upstream?

I must admit I haven't looked close at it; it failed with 8 jobs in the first place.

I wonder if we really need the make install-doc* patch (TeX usage) since apparently all errors are ignored... ;-)

True, but it also prevents tex from hanging (you know the major misfeature of tex when it prompts the user for input). We could probably solve this by redirecting the standard input from /dev/null, but I think not building the documentation is a cleaner solution.

I wonder if we really need the make install-doc* patch (TeX usage) since apparently all errors are ignored... ;-)

True, but it also prevents tex from hanging (you know the major misfeature of tex when it prompts the user for input). We could probably solve this by redirecting the standard input from /dev/null, but I think not building the documentation is a cleaner solution.

It won't prompt you if it isn't installed... ;-) Other issues?

In general, I think if [La]TeX is installed, we should use it, at least unless there's also some equivalent HMTL documentation or alike.

Well this is a spectacular bandaid to work around compiler breakage :-)

Having hard-coded memory and time limits will just create problems down the road as pari is bound to get bigger, gcc is going to use more ram, and people try this on a wider (slower) range of hardware.

We apparently know that optimization of Pari is not working correctly on gcc 4.4.1 yet we still build it with optimization and hope for the best? What could possibly go wrong?

How about we disable optimization (or set a known-good value, maybe -O1) if the compiler is gcc-4.4.1. Unless you set SAGE_PARI_tune=yes, in which case we'll still build it with all optimizations turned on. That way we are on the safe side and the workaround will become unnecessary over time as people upgrade to newer gcc releases.
And if you know what you are doing you can easily override it.

Well this is a spectacular bandaid to work around compiler breakage :-)

Having hard-coded memory and time limits will just create problems down the road as pari is bound to get bigger, gcc is going to use more ram, and people try this on a wider (slower) range of hardware.

Well, the chosen limits are very conservative, so I doubt we will run into this problem any time soon.

We apparently know that optimization of Pari is not working correctly on gcc 4.4.1 yet we still build it with optimization and hope for the best? What could possibly go wrong?

There isn't just one single broken version, we want to catch all broken gcc's. See #9897 for example.

I totally agree with this by the way, but in my opinion there are two things we can do:

Use -O3 always and leave the user with a non-compiling Sage ("it's not our fault, it's gcc's fault")

Do the optimization-fallback as in this spkg.

Show the error and then tell the user how to report the issue and continue the build without optimization. I think all thats needed is

CFLAGS=-O0 ./sage -f spkg/standard/pari*
make

to build pari with no optimization and then build the rest?

I believe making a blacklist of versions known to fail is not a good solution because there will always be systems with a broken gcc that we don't know of.

But if we silently try some workarounds then we will never find out about broken compilers either. The user should file a trac ticket to document the issue so it can be fixed in Sage and reported upstream.

But if we silently try some workarounds then we will never find out about broken compilers either. The user should file a trac ticket to document the issue so it can be fixed in Sage and reported upstream.

I would prefer that users post the bug directly upstream (or not, gcc ignores most bug reports anyway). I certainly don't want to waste too much energy in making gcc bug reports for every user who can't compile Sage due to a gcc bug.

brokenness of -O3 on itanium is discussed on the gcc bug ticket and definitely a compiler bug.

gcc-4.4.1 with >= -O2 just starts using all available memory until it dies.

My suggestion would be to just catch those two cases. And add a useful error message to the spkg in case pari fails to compile, explaining how to report this and how to work around it by compiling the pari spkg with different CFLAGS.

I'm happy with the current version, so I'll give this ticket a positive review. If any compiler bugs are still preventing pari from being built on some hardware then this should be reported to the gcc wrapper.

That'll get rid of potential races in installation. Perhaps we should disable parallel make for all spkgs that don't use a proven build system like autotools or SCons. Chances are that any hand-rolled makefile has concurrency issues...

I'll take it that you are going to commit the changes to the included repository before adding the spkg to the next Sage release, because right now they are not.