So I think the problem here is actually something that will be made worse by bug 403524 -- since bug 403524 will make this happen in quirks mode too.
The problem here is, I think, as follows:
The middle two <a> elements (assuming fixing the "<p") have, in nsLineLayout.cpp, the zeroEffectiveSpanBox flag set, so in nsLineLayout::VerticalAlignFrames we go through this code:
if (minY > 0) {
// shrink the content by moving its top down. This is tricky, since
// the top is the 0 for many coordinates, so what we do is
// move everything else up.
spanFramePFD->mAscent -= minY; // move the baseline up
spanFramePFD->mBounds.height -= minY; // move the bottom up
psd->mTopLeading += minY;
pfd = psd->mFirstFrame;
while (nsnull != pfd) {
pfd->mBounds.y -= minY; // move all the children back up
pfd->mFrame->SetRect(pfd->mBounds);
pfd = pfd->mNext;
}
maxY -= minY; // since minY is in the frame's own coordinate system
minY = 0;
}
However, we then forget about this adjustment we've made to the frame's ascent, since nsInlineFrame.cpp has:
nscoord
nsInlineFrame::GetBaseline() const
{
nscoord ascent = 0;
nsRefPtr<nsFontMetrics> fm;
nsLayoutUtils::GetFontMetricsForFrame(this, getter_AddRefs(fm));
if (fm) {
ascent = fm->MaxAscent();
}
return NS_MIN(mRect.height, ascent + GetUsedBorderAndPadding().top);
}
This bug will be fixed by somehow reflecting that adjustment in the ascent in the result of nsInlineFrame::GetBaseline.

And probably the simplest way to reftest this is to do a != test comparing:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<title>test for bug 223764</title>
<p style="overflow:hidden"><span style="font-size: 100px; text-decoration:underline"><span style="font-size: 20px">hello</span></span></p>
</body>
</html>
to the same thing without the underline (since with the bug, the underline is way outside the box and is cut off with overflow:hidden).
Probably best to have the test both as above (in almost-standards mode) and in quirks mode.