Created attachment 19[details]
householder_1 valgrind output (relwithdebinfo)
I had a spare ten minutes to have a look at it. I get the same stack trace as Hauke. Debug build does not show segfault, but RelWithDebInfo does. I ran it through valgrind and attach the report. The first thing valgrind says is a 'Use of uninitialised value of size 4' at the line where the program segfaults. Happy bug hunting!

The bug seems to be specific to gcc 4.3 (versions 4.1 and 4.4 work fine).
Bisection points to revision 1e25be802b3b as the problem (see http://bitbucket.org/eigen/eigen/changeset/1e25be802b3b ). This is a revision by Gael on 20 July changing the VERIFY_RAISES_ASSERTS macro. I'm not sure this revision is really the problem as it does not seem related. However, if I define EIGEN_DEBUG_ASSERTS then the problem disappears.

The error (a 'use of uninitialized value' according to valgrind) appears in line 62 in the function householder(). However, if I comment out lines 83 until the end of the function, the error disappears. This does not make any sense at all!
So, we have a weird error that only appears with gcc 4.3 and only for small matrices. That reminds me of the error discussed in the thread http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2010/06/msg00489.html half a year ago. Indeed, if I try compiling with the same flags as normal for RelWithDebInfo, but with -fno-strict-aliasing instead of -fstrict-aliasing, then the error disappears again.
We never really found out what happens with the bug half a year ago (as far as I remember), Gael suspects a GCC bug, and it was way over my head, so I don't know how to proceed.
If this were the only issue left, one possibility would be to release 3.0-beta3 with a health warning ("exposes a bug in some situations when compiled with gcc 4.3 only").

(In reply to comment #11)
> Indeed, if I try compiling with the same flags as normal for
> RelWithDebInfo, but with -fno-strict-aliasing instead of -fstrict-aliasing,
> then the error disappears again.
Ah ok, so could it be that we're infringing on strict aliasing rules, and that only gives a crash with a particular optimization that gcc 4.3 does.
That would be our bug, but still not a blocker I guess.

Created attachment 96[details]
Patch for fuzzy compare
Since 8 February, the following tests are failing on my computer with gcc 4.3: basicstuff_1, linearstructure_1, array_for_matrix_1, cholesky_1, jacobisvd_1, jacobisvd_3, eigen2support_1.
I looked a bit into the first one, basicstuff_1. It seems very similar at first sight: the failure is resolved when building in debug mode or with -fno-strict-aliasing. The test failure is caused by a segfault in the last line of basicStuff():
VERIFY_IS_APPROX(sm2,-sm1.transpose());
Here, sm1 and sm2 are of type Matrix<float,1,1>. The segfault disappears if we replace the line by the equivalent
VERIFY_IS_APPROX(-sm2,sm1.transpose());
When rebuilding with RelWithDebInfo, the stack trace is as follows:
#0 Eigen::DenseBase<Eigen::Matrix<float, 1, 1, 1, 1, 1> >::isApprox<Eigen::CwiseUnaryOp<Eigen::internal::scalar_multiple_op<float>, Eigen::Matrix<float, 1, 1, 1, 1, 1> const> > (
this=0xbfd38ee8, other=@0xbfd38e88, prec=0.00100000005)
at /home/amsta/jitse/scratch/eigen-official/Eigen/src/Core/Functors.h:188
#1 0x0804f353 in basicStuff<Eigen::Matrix<float, 1, 1, 1, 1, 1> > (m=@0xbfd38f68)
at /home/amsta/jitse/scratch/eigen-official/test/main.h:350
#2 0x0804c1ea in test_basicstuff ()
at /home/amsta/jitse/scratch/eigen-official/test/basicstuff.cpp:216
#3 0x0804c939 in main (argc=1, argv=0xbfd39064)
at /home/amsta/jitse/scratch/eigen-official/test/main.h:533
Bisection says that the first bad revision is 5e6b790649c4 (fix fuzzy compares for integer types, using a selector). Interestingly, like last time this is a change to the fuzzy compares, though I cannot see why the change triggers a bug.
Nevertheless, I tried to rewrite the fuzzy compare; see patch. The resulting code evaluates the matrices being compared twice instead of using the ::Nested type (so it's less efficient). However, the patch does get rid of the basicstuff_1 failure, and also of the failing tests linearstructure_1, array_for_matrix_1, jacobisvd_1, jacobisvd_3. On the other hand, the tests cholesky_1, eigen2support_1, householder_1, householder_2 still fail after applying the patch, and additionally the test array_1 now fails while it passed beforehand. So the patch is not a solution, but it might hint at where to look at.

Created attachment 100[details]
test case
Here's a small test case. It runs fine by default, but segfaults if you compile with -DEIGEN_INTERNAL_DEBUGGING.
$ g++-4.3 a.cpp -I eigen -o a -O2 -DEIGEN_INTERNAL_DEBUGGING && ./a
Segmentation fault
More specifically, the segfault disappears if I define eigen_internal_assert(x) as empty.
So it is really eigen_internal_assert that is triggering the crash, and I now really believe that it is a plain GCC 4.3 bug, because I can't see how this legitimately be blamed on strict aliasing.

I had a look at the other, older compiler that's installed at uni and I'm getting similar issues there. The compiler version is
gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-27)
and five tests fail with segfaults; see http://eigen.tuxfamily.org/CDash/viewTest.php?onlyfailed&buildid=4988 . I looked into one of them, product_small_4. Valgrind says the following (the segfault occurs soon afterward in the same line of code)
Use of uninitialised value of size 4
at 0x805AD1D: bool Eigen::DenseBase<Eigen::Transpose<Eigen::Matrix<double, 4, 1, 0, 4, 1> > >::isApprox<Eigen::CoeffBasedProduct<Eigen::Transpose<Eigen::Matrix<double, 4, 1, 0, 4, 1> >, Eigen::Matrix<double, 4, 4, 0, 4, 4> const&, 6> >(Eigen::DenseBase<Eigen::CoeffBasedProduct<Eigen::Transpose<Eigen::Matrix<double, 4, 1, 0, 4, 1> >, Eigen::Matrix<double, 4, 4, 0, 4, 4> const&, 6> > const&, double) const (MathFunctions.h:299)
by 0x80646FC: void product<Eigen::Matrix<double, 4, 4, 0, 4, 4> >(Eigen::Matrix<double, 4, 4, 0, 4, 4> const&) (main.h:350)
by 0x8053777: test_product_small() (product_small.cpp:34)
by 0x8053B91: main (main.h:533)
The test fails only if SSE2 is enabled and the build type is either Release or RelWithDebInfo. The failure is resolved when compiling with -fno-strict-aliasing. Disabling boolean redux unrolling per comment 24 does not make a difference, but disabling internal debugging per comment 25 does make the test pass. In both cases, I changed the patch so that it's also activated by gcc 4.1.

(In reply to comment #10)
> The bug seems to be specific to gcc 4.3 (versions 4.1 and 4.4 work fine).
>
> Bisection points to revision 1e25be802b3b as the problem (see
> http://bitbucket.org/eigen/eigen/changeset/1e25be802b3b ). This is a revision
> by Gael on 20 July changing the VERIFY_RAISES_ASSERTS macro. I'm not sure this
> revision is really the problem as it does not seem related. However, if I
> define EIGEN_DEBUG_ASSERTS then the problem disappears.
Oh, I had missed that. I confirm that undoing this change makes at least the householder and basicstuff tests pass, but not the array tests (which already have a work-around above).

OK, now I have checked with EIGEN_NO_ASSERTION_CHECKING so that our test suite doesn't fiddle with eigen_assert() anymore, so we are in the real-world conditions of our users, and I have changed the VERIFY macro to just
#define VERIFY(a) assert(a)
in order to simulate a user using assert() on Eigen expressions. I didn't get any crash.
--> So it really seems that the issue was purely with the asserts that we have inside of Eigen's code, and that the patch fixes it i.e. it's not just hiding it.

These tests are failing for me:
product_extra_3, product_symm_7, product_trmv_4, product_trsolve_3, stable_norm_1, (nullary_4)
nullary_4 failed only once, but passed all the other times I tried, even with r1000000.
stable_norm_1 is probably not critical - the test fails because norm() gives the correct result
product_extra_3 failure has a different feel to it so probably another bug.

But this is a separate bug than what we were investigating originally here, which was crashes and valgrind errors. Here, we have what looks like floating point errors. Valgrind reports no error. So, filing a new bug.