Commit Message

> > So SOC cancels out in the runtime check.> > I still think we need two formulas - one determining if vectorization is> > profitable, other specifying the threshold for scalar path at runtime (that> > will generally give lower values).> > True, we want two values. But part of the scalar path right now> is all the computation required for alias and alignment runtime checks> (because the way all the conditions are combined).> > I'm not much into the details of what we account for in SOC (I suppose> it's everything we insert in the preheader of the vector loop).
Yes, it seems contain everything we insert prior the loop in unfolded form.
> > + if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))> + fprintf (vect_dump, "not vectorized: estimated iteration count > too small.");> + if (vect_print_dump_info (REPORT_DETAILS))> + fprintf (vect_dump, "not vectorized: estimated iteration count > smaller than "> + "user specified loop bound parameter or minimum "> + "profitable iterations (whichever is more > conservative).");> > this won't work anymore btw - dumping infrastructure changed.
Ah, will update that.
> > I suppose your patch is a step in the right direction, but to really> make progress we need to re-organize the loop and predicate structure> produced by the vectorizer.
This reminds me what I did for string functions on x86. It gets very hard
to get all the paths right when one starts to be really cureful to not
output too much cruft on the short paths + do not consume too many registers.
In fact I want to re-think this for the SSE string ops patch, so I may try to
look into that incrementally.
> > So, please update your patch, re-test and then it's ok.
Thanks.
> > I tested enabling loop_ch in early passes with -fprofile-feedback and it is SPEC> > neutral. Given that it improves loop count estimates, I would still like mainline> > doing that. I do not like these quite important estimates to be wrong most of time.> > I agree. It also helps getting rid of once rolling loops I think.
I am attaching the patch for early-ch. Will commit it tomorrow.
Concerning jump threading, it would help to make some of it during early passes
so the profile estiamte do not get invalided. I tried to move VRP early but now it
makes compiler to hang during bootstrap. I will debug that.
> > > > > > > Btw, I added a "similar" check in vect_analyze_loop_operations:> > > > > > if ((LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)> > > && (LOOP_VINFO_INT_NITERS (loop_vinfo) < vectorization_factor))> > > || ((max_niter = max_stmt_executions_int (loop)) != -1> > > && (unsigned HOST_WIDE_INT) max_niter < vectorization_factor))> > > {> > > if (dump_kind_p (MSG_MISSED_OPTIMIZATION))> > > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,> > > "not vectorized: iteration count too small.");> > > if (dump_kind_p (MSG_MISSED_OPTIMIZATION))> > > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,> > > "not vectorized: iteration count smaller than "> > > "vectorization factor.");> > > return false;> > > }> > > > > > maybe you simply need to update that to also consider the profile?> > > > Hmm, I am still getting familiar wth the code. Later we later have> > if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)> > && LOOP_VINFO_INT_NITERS (loop_vinfo) <= th)> > {> > if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))> > fprintf (vect_dump, "not vectorized: vectorization not "> > "profitable.");> > if (vect_print_dump_info (REPORT_DETAILS))> > fprintf (vect_dump, "not vectorized: iteration count smaller than "> > "user specified loop bound parameter or minimum "> > "profitable iterations (whichever is more conservative).");> > return false;> > }> > > > where th is always greater or equal than vectorization_factor from the cost model.> > So this test seems redundant if the max_stmt_executions_int was pushed down> > to the second conditoinal?> > Yes, sort of. The new check was supposed to be crystal clear, and> even with the cost model disabled we want to not vectorize in this> case. But yes, the whole cost-model stuff needs TLC.
Ah yes, without cost model we would skip it. I suppose we do not need to
brother witht he profile estiamte in the case anyway. They are kind of aprt of
the cost models.
* passes.c (init_optimization_passes): Schedule early CH.
* tree-pass.h (pass_early_ch): Declare it.
* tree-ssa-loop-ch.c (gate_early_ch): New function.
(pass_early_ch): New pass.

> On Sat, Oct 6, 2012 at 11:34 AM, Jan Hubicka <hubicka@ucw.cz> wrote:> > Hi,> > I benchmarked the patch moving loop header copying and it is quite noticeable win.> >> > Some testsuite updating is needed. In many cases it is just because the> > optimizations are now happening earlier.> > There are however few testusite failures I have torubles to deal with:> > ./testsuite/gcc/gcc.sum:FAIL: gcc.dg/tree-ssa/pr21559.c scan-tree-dump-times vrp1 "Threaded jump" 3> > ./testsuite/gcc/gcc.sum:FAIL: gcc.dg/tree-ssa/ssa-dom-thread-2.c scan-tree-dump-times vrp1 "Jumps threaded: 1" 1> > ./testsuite/gcc/gcc.sum:FAIL: gcc.dg/vect/O3-slp-reduc-10.c scan-tree-dump-times vect "vectorized 1 loops" 2> > ./testsuite/g++/g++.sum:FAIL: g++.dg/tree-ssa/pr18178.C -std=gnu++98 scan-tree-dump-times vrp1 "if " 1> > ./testsuite/g++/g++.sum:FAIL: g++.dg/tree-ssa/pr18178.C -std=gnu++11 scan-tree-dump-times vrp1 "if " 1> >> > This is mostly about VRP losing its ability to thread some jumps from the> > duplicated loop header out of the loop across the loopback edge. This seems to> > be due to loop updating logic. Do we care about these?> > Yes, I think so. At least we care that the optimized result is the same.
it is not, we really lose optimization in those testcases.
The ones that are still optimized well I updated in the patch bellow.
> > Can you elaborate on "due to loop updating logic"?
The problem is:
/* We do not allow VRP information to be used for jump threading
across a back edge in the CFG. Otherwise it becomes too
difficult to avoid eliminating loop exit tests. Of course
EDGE_DFS_BACK is not accurate at this time so we have to
recompute it. */
mark_dfs_back_edges ();
/* Do not thread across edges we are about to remove. Just marking
them as EDGE_DFS_BACK will do. */
FOR_EACH_VEC_ELT (edge, to_remove_edges, i, e)
e->flags |= EDGE_DFS_BACK;
Loop header copying puts some conditional before loop and we want to thread
up to exit out of the loop (that I think it rather important optimization).
But it no longer happens before back edge is in the way. At least that was
the case in the tree-ssa failures I analyzed.
> > Can you elaborate on the def_split_header_continue_p change? Which probably> should be tested and installed separately?
Yes, that one is latent bug. The code is expecting that loop exit is recognized
by loop depth decreasing that is not true.
It reproduces as ICE during bootstrap with the patch.
I will regtest/bootstrap and commit it today.
Honza