I like your idea of having both the result-we-currently-expect and the result-we-think-may-be-more-correct to be checked in. I still prefer Dirk's naming scheme though.
I get the notion that "expected" always means literally what it seems to mean from the standpoint of whether the tooling is silent for the test (actual == expected) or has something to say.
But I think that if the tooling is behaving right, your concern that "a test would fail if it did *not* match the "failing" result" would be addressed: the tooling could be silent for actual == failing (if a failing file exists) but notify you of an "unexpected pass" if actual == expected.
-F
On Aug 18, 2012, at 1:02 AM, Maciej Stachowiak <mjs at apple.com> wrote:
>> On Aug 17, 2012, at 8:02 PM, Filip Pizlo <fpizlo at apple.com> wrote:
>>> So this is down to expected/failing and expected/previous?
>> The difference is not just one of names, but also of whether you keep both around. One could imagine a proposal where you add -failing.txt and keep -expected.txt. But that would be confusing because
>> I forgot to mention another benefit of keeping the old expectation around, which is that tools can tell you if you start producing it again. Though I don't know how common it is for a test to be accidentally fixed.
>> Yet another is that we'd avoid the slowdown of having to look for two different filenames for the current expected result.
>>>>> I must say that expected/failing feels less confusing. Easier to remember if I have to quickly recall what it means.
>> I don't think it's really that obvious what it means. A "failing" result would, if checked in, be what is expected, and a test would fail if it did *not* match the "failing" result. That is hardly intuitive. You'd also have to sometimes label things as "failing" when you don't actually know that it's failing, just that it's different than the previous result.
>> I don't know if any of my suggestions for "a result from before that may or may not be better than the current result" are that great. But I would much prefer to always use -expected to refer to the result that is currently expected.
>> Regards,
> Maciej
>>>>>> -Filip
>>>> On Aug 17, 2012, at 7:36 PM, Dirk Pranke <dpranke at chromium.org> wrote:
>>>>> I'm not sure if I like this idea or not. A couple of observations/questions ...
>>>>>> 1) I wouldn't want to call it '-correct' unless we were sure it was
>>> correct; '-previous' is better in that regard
>>>>>> 2) the issue with keeping a '-correct' in the tree is that it's quite
>>> possible for a previous correct expectation to need to change to a
>>> different expectation to still be correct. i.e., they get stale. I
>>> fear that this could quickly become more confusing than it's worth.
>>> It's also not clear to me when -previous gets updated or removed?
>>>>>> 3) It also feels like '-previous' is something that we can just as
>>> easily get from SVN/Git/whatever, in a completely foolproof and
>>> automatic way. I grant that it's easier to just do a side by side
>>> compare, but "diff against previous" isn't that hard to do and we
>>> could easily write a wrapper to make it trivial ...
>>>>>> 4) I'd want to work through the various branches in the workflow more
>>> before I felt comfortable with this. When I was coming up with my
>>> original proposal I originally wanted to allow -passing and -expected
>>> to live side-by-side, but all sorts of complications arose, so I'd be
>>> worried that we'd have them here, too.
>>>>>> -- Dirk
>>>>>> On Fri, Aug 17, 2012 at 6:51 PM, Ojan Vafai <ojan at chromium.org> wrote:
>>>> That matches my understanding. You proposed modification sounds fine to me.
>>>>>>>> On Fri, Aug 17, 2012 at 6:40 PM, Maciej Stachowiak <mjs at apple.com> wrote:
>>>>>>>>>>>>>>> My understanding of the current proposal is this:
>>>>>>>>>> 1) This applies to tests that fail deterministically, for reasons other
>>>>> than a crash or hang.
>>>>> 2) If the test has a new result that you're confident is a progression (or
>>>>> neither better or worse), you simply update the -expected.txt file.
>>>>> 3) If the test has a new result that you're confident is a regression, you
>>>>> delete the -expected.txt file and check in the new results as -failing.txt.
>>>>> 4) Ditto points 2 and 3 with respect to -expected.png, for image diffs.
>>>>> 5) We would stop using all other ways of marking tests that fail
>>>>> deterministically, including Skipped and the many things you could enter in
>>>>> TestExpectations.
>>>>>>>>>> Is that correct?
>>>>>>>>>> If so, I'd like to suggest a minor modification. In place of point 3
>>>>> above, I propose the following:
>>>>>>>>>> 3) If the test has a new result that you're confident is a regression, you
>>>>> rename the -expected.txt file to -previous.txt (or maybe -correct.txt or
>>>>> -pre-expected.txt or something) and check in the new results as
>>>>> -expected.txt (unless there is already a -previous.txt, in which case just
>>>>> update -expected and leave -previous).
>>>>>>>>>> I propose this for the following reasons:
>>>>>>>>>> - It maintains the longstanding approach that -expected.txt reflects what
>>>>> is currently *expected* to happen, not whether it is right or wrong in some
>>>>> abstract sense. It is an expectation, not a reference.
>>>>> - It still leaves a clear indication of tests that somebody needs to look
>>>>> at further, to determine if a regression occurred.
>>>>> - It leaves both old and new result in place for easy comparison by an
>>>>> expert.
>>>>>>>>>> Regards,
>>>>> Maciej
>>>>>>>>>>>>>>> On Aug 17, 2012, at 6:06 PM, Ojan Vafai <ojan at chromium.org> wrote:
>>>>>>>>>> +1
>>>>>>>>>>>>>>> On Fri, Aug 17, 2012 at 5:36 PM, Ryosuke Niwa <rniwa at webkit.org> wrote:
>>>>>>>>>>>> That's my expectation although we probably can't do that for flaky tests
>>>>>> :(
>>>>>>>>>>>> e.g. sometimes fails with image diff.
>>>>>>>>>>>> On Fri, Aug 17, 2012 at 5:35 PM, Filip Pizlo <fpizlo at apple.com> wrote:
>>>>>>>>>>>>>> +1, contingent upon the following: are we agreeing that all current uses
>>>>>>> of TEXT, IMAGE, and so forth in TestExpectations should be in the *very near
>>>>>>> term* following Dirk's change be turned into -failing files?
>>>>>>>>>>>>>> -Filip
>>>>>>>>>>>>>>>>>>>>> On Aug 17, 2012, at 5:01 PM, Ryosuke Niwa <rniwa at webkit.org> wrote:
>>>>>>>>>>>>>> On Fri, Aug 17, 2012 at 4:55 PM, Ojan Vafai <ojan at chromium.org> wrote:
>>>>>>>>>>>>>>>> Asserting a test case is 100% correct is nearly impossible for a large
>>>>>>>> percentage of tests. The main advantage it gives us is the ability to have
>>>>>>>> -expected mean "unsure".
>>>>>>>>>>>>>>>> Lets instead only add -failing (i.e. no -passing). Leaving -expected to
>>>>>>>> mean roughly what it does today to Chromium folk (roughly, as best we can
>>>>>>>> tell this test is passing). -failing means it's *probably* an incorrect
>>>>>>>> result but needs an expert to look at it to either mark it correct (i.e.
>>>>>>>> rename it to -expected) or figure out how the root cause of the bug.
>>>>>>>>>>>>>>>> This actually matches exactly what Chromium gardeners do today, except
>>>>>>>> instead of putting a line in TestExpectations/Skipped to look at later, they
>>>>>>>> checkin the -failing file to look at later, which has all the advantages
>>>>>>>> Dirk listed in the other thread.
>>>>>>>>>>>>>>>>>>>>> I'm much more comfortable with this proposal.
>>>>>>>>>>>>>> - Ryosuke
>>>>>>>>>>>>>> _______________________________________________
>>>>>>> webkit-dev mailing list
>>>>>>>webkit-dev at lists.webkit.org>>>>>>>http://lists.webkit.org/mailman/listinfo/webkit-dev>>>>>>>>>> _______________________________________________
>>>>> webkit-dev mailing list
>>>>>webkit-dev at lists.webkit.org>>>>>http://lists.webkit.org/mailman/listinfo/webkit-dev>>> _______________________________________________
>>> webkit-dev mailing list
>>>webkit-dev at lists.webkit.org>>>http://lists.webkit.org/mailman/listinfo/webkit-dev>> _______________________________________________
>> webkit-dev mailing list
>>webkit-dev at lists.webkit.org>>http://lists.webkit.org/mailman/listinfo/webkit-dev>