Crash Data

Security

(public)

User Story

We need to get minotaur running as part of the buildbot/tinderbox set up. This is the tracking bug for this effort.
I'm not entirely sure how this needs to work w.r.t. buildbot requirements, but here is my take on what needs to happen:
1. Acquire 3 machines/VMs to run Minotaur on (it could run in a machine just as easily as on a real box). We need a Linux, Mac, and Windows machine. I'd recommend 10.4 OS X and WinXP to start with, just to keep configuration headaches that we often encounter with OS X 10.5 and Vista to a minimum.
2. Integrate Minotaur into the Build Bot setup
3. Have results that can be parse-able/understandable to the build bot/tinderbox pages.
For step 2, here is how I imagine the process working:
1. The "L10N buildbot/tinderbox system" alerts us (or we discover) that a new build of locale AB-CD is available.
2. Our buildbot master(?) downloads the build to the slave box
3. Our buildbot slave installs the build
4. Our buildbot slave checks out the reference files from CVS or Hg
5. The buildbot slave runs minotaur via the run-minotaur.sh to run once, comparing it to the checked out reference files
6. The buildbot slave calls some code (as yet unwritten - this is step 3 above) to transform the minotaur results into something useable for a waterfall page
7. The buildbot slave (or the results transformer) uploads the results to the waterfall page.
8. The buildbot slave uninstalls the AB-CD build, removes the profiles, removes the reference files.

Here are a couple of thoughts regarding major step 2, Integrate Minotaur into the Buildbot setup:
> For step 2, here is how I imagine the process working:
> 1. The "L10N buildbot/tinderbox system" alerts us (or we discover) that a new
> build of locale AB-CD is available.
We can make use of the way Talos finds builds currently and just poll the staging directories wherever the builds get pushed. I'd also like to see that
> 6. The buildbot slave calls some code (as yet unwritten - this is step 3 above)
> to transform the minotaur results into something useable for a waterfall page
should be pretty easy. I expect we'll just want to have sections for bookmarks, settings, whatever with rolled-up numbers of pass/fail for each section.
> 7. The buildbot slave (or the results transformer) uploads the results to the
> waterfall page.
easy
> 8. The buildbot slave uninstalls the AB-CD build, removes the profiles, removes
> the reference files.
also easy!
So, really, we've built something like this already, it's just a matter of translating to work with minotaur and the large numbers of builds l10n produces. I expect much of the work is going to come from figuring out where all those builds are being pushed and pulling them in.
I don't really see any missing steps in what you've described above.

We need to get the L10n builds going also. This is a critical step that should be called out explicitly. This is in joduinn's court, right? Can we get an ETA on that?
We need to get the machines ordered. Clint, can you do that?
Is the waterfall page the right place for the results? With 40-50 locales reporting for each platform, shouldn't the waterfall just have a link to a directory of results or some such thing?

(In reply to comment #3)
> We need to get the L10n builds going also. This is a critical step that should
> be called out explicitly. This is in joduinn's court, right? Can we get an
> ETA on that?
I see the L10n builds part is covered by bug 422759. So ignore this.

(In reply to comment #0)
So, let's assign some of these tasks to people to get them done.
> 1. Acquire 3 machines/VMs to run Minotaur on (it could run in a machine just as
> easily as on a real box). We need a Linux, Mac, and Windows machine. I'd
> recommend 10.4 OS X and WinXP to start with, just to keep configuration
> headaches that we often encounter with OS X 10.5 and Vista to a minimum.
I'll take the lead in procuring these things
>
> 2. Integrate Minotaur into the Build Bot setup
see below
> 3. Have results that can be parse-able/understandable to the build
> bot/tinderbox pages.
see below
> For step 2-3, here is how I imagine the process working:
> 1. The "L10N buildbot/tinderbox system" alerts us (or we discover) that a new
> build of locale AB-CD is available.
Sounds like the infrastructure for this work is happening in bug 422759> 2. Our buildbot master(?) downloads the build to the slave box
> 3. Our buildbot slave installs the build
I'll take the lead in coding these, I'll probably leverage a bunch of the Talos code, that Robcee already mentioned.
> 4. Our buildbot slave checks out the reference files from CVS or Hg
This should be pretty easy as this is a core feature of buildbot (getting stuff from a repository), I'll do that.
> 5. The buildbot slave runs minotaur via the run-minotaur.sh to run once,
> comparing it to the checked out reference files
Makes sense for me to do this too, since I know how minotaur works.
> 6. The buildbot slave calls some code (as yet unwritten - this is step 3 above)
> to transform the minotaur results into something useable for a waterfall page
Robcee: do you think you or someone from your team could work with me to take this one on? I have no idea how we go about uploading things to the tinderbox waterfalls or even to buildbot waterfalls. I can help them understand the minotaur results, but I'd need their expertise in how to package the data for those interfaces.
> 7. The buildbot slave (or the results transformer) uploads the results to the
> waterfall page.
I think this is part of the above step 6. Once we have a valid "results" package, I imagine the upload is probably very simple, again, something buildbot is made to do.
> 8. The buildbot slave uninstalls the AB-CD build, removes the profiles, removes
> the reference files.
I'll do this part.
How does this sound?

Some items Clint and I just talked about:
1) There are no special OS/toolset requirements for these VMs, and ideally, they should be identical to the existing Build/RelEng VMs. This means that as part of starting up running the minotaur test suite, the slave needs to bring down anything/everything it needs to run cleanly. This may (or may not) be run on the same slave as is used to generate the build.
2) It feels like overkill to run minotaur per checkin, but doing this once a day seems reasonable.
3) Minotaur takes under 1 min to run on each locale, but there are 40+ locales on each o.s., so it takes about 40mins to complete currently. Ideally, we should be able to do each locale independently of each other, in parallel, for significant performance gains.

Actually, failures in Minotaur tests require immediate action on the rep, either to fix the reference, or to back out a localizer check-in. Thus, it should really be run on check-in.
We can play a little in terms of platforms, though. It should be hard to create a failure on just on platform for a minotaur test, so just running then on, say, linux would probably benefit build times. Running Minotaur on all platforms is more of a fuzz test of the test executation, I'm not sure how often we should do that. Most likely scenario is something odd in the test profile creation, or an installer regression.

(In reply to comment #8)
> We can play a little in terms of platforms, though. It should be hard to create
> a failure on just on platform for a minotaur test, so just running then on,
> say, linux would probably benefit build times. Running Minotaur on all
> platforms is more of a fuzz test of the test executation, I'm not sure how
> often we should do that. Most likely scenario is something odd in the test
> profile creation, or an installer regression.
I'd agree with Axel here. It really makes sense to run Minotaur on one platform, especially when you come to the problem of storing the output for the test. If we store the output in CVS, it's quite a lot of files for one minotaur run. But it multiplies by the number of platforms we have.
Optimizing the platform in terms of buildspeed sounds like a fine idea. But, we do have existing baselines for Windows and Mac checked in to CVS already, which makes me lean toward using one of those platforms instead. What is the delta between build times among the three platforms? If it's significant then it would make sense to re-run a linux baseline, check that in and cvs remove the other two.

John, Can you or Rob comment on what steps need to be taken to work minotaur into the existing infrastructure? You talked about a make check step to check out the code yesterday. Do you have an example of this somewhere? Is there a staging setup somewhere to play with? A document for the structure of these machines so I can mock one up and use that for developing the integration point?
Also, it'd be good to have a clear delineation of what parts of this you expect me to do and what parts Rob (or somebody else on the B&R side) can do.
Thanks!!!!

Here's my hidden-agenda big picture:
Right now, mac seems to be the box that's cycling fastest. That said, I hope to get another testing extension following Minotaur in this set up, taking screen shots. And that's going to be slowest on the mac, thanks to the dialog resizing stuff it does. The difference between mac and linux isn't all that big right now, thus I suggested linux.
I would really hope that the reference files are cross-platform, and thus the test execution environment. Is that so?

(In reply to comment #10)
> John, Can you or Rob comment on what steps need to be taken to work minotaur
> into the existing infrastructure? You talked about a make check step to check
> out the code yesterday. Do you have an example of this somewhere? Is there a
> staging setup somewhere to play with? A document for the structure of these
> machines so I can mock one up and use that for developing the integration
> point?
Hey Clint. Take a look at:
http://lxr.mozilla.org/mozilla/source/tools/buildbotcustom/unittest/steps.py#162
and at:
http://lxr.mozilla.org/mozilla/source/tools/buildbot-configs/testing/moz2unit/master.cfg#150
for an example of how to call it in Buildbot.
> Also, it'd be good to have a clear delineation of what parts of this you expect
> me to do and what parts Rob (or somebody else on the B&R side) can do.
I'm not really sure. Obviously, we'd love to get a complete solution that we could just drop onto some machines but that's probably not realistic. Maybe next week we can sit down together and figure some of this stuff out when I'm in town.

"Existing infrastructure" is likely not it. The existing infrastructure is tinderbox client, and I guess we're all set that we don't like that.
I guess the idea should be to have a set of requirements on the minotaur side, and a set of steps, and to then make the infrastructure such that it can call that.
I would expect that one step is to call into http://lxr.mozilla.org/mozilla/source/testing/release/minotaur/minotaur.sh to get the current data out of the build, and I would like to see the comparison step separately. That would enable us to use richer python there as time permits.
Clint, do you have a set of requirements? In terms of binaries, where they are, source, other directories to play with? I know there's http://lxr.mozilla.org/mozilla/source/testing/release/minotaur/README.txt, but I never run minotaur myself, so I don't know if that's current.
Re all, I'm currently trying to wrap my head around what I did in terms of l10n build stuff and why, but I didn't get to the buildbot part yet.

(In reply to comment #13)
> "Existing infrastructure" is likely not it. The existing infrastructure is
> tinderbox client, and I guess we're all set that we don't like that.
>
Well, I think that we're all talking about buildbot and not tinderbox. I kind of thought that was pretty clear.
> I guess the idea should be to have a set of requirements on the minotaur side,
> and a set of steps, and to then make the infrastructure such that it can call
> that.
>
> I would expect that one step is to call into
> http://lxr.mozilla.org/mozilla/source/testing/release/minotaur/minotaur.sh to
> get the current data out of the build, and I would like to see the comparison
> step separately. That would enable us to use richer python there as time
> permits.
>
> Clint, do you have a set of requirements? In terms of binaries, where they are,
> source, other directories to play with? I know there's
> http://lxr.mozilla.org/mozilla/source/testing/release/minotaur/README.txt, but
> I never run minotaur myself, so I don't know if that's current.
The requirements are pretty clear (to me at least):
1. CVS Checkout minotaur tool & configs from <last nightly run>
2. Run the run-minotaur-py script, pointing it at the firefox build to download (we can either download and install the build with this tool or we can use mozInstall.py to install it once we download it with a buildbot step as a step between 1 and 2), and the other parameters it needs
3. Pick out any failures from results.log and send results up to a reporting page.
4. Upload the new CVS results to a new <nightly> directory for this locale.
5. Delete the n-2 locale set of files.
6. And Repeat with the next locale.
I kind of wonder about how long we need to keep the nightly results around. Indefinitely? If so, we can remove step 5.
I'm fine with making these things work on linux - minotaur already runs on all three platforms. And since there isn't much (or any) difference between the results files on different platforms (since L10N doesn't change much/any between platforms), but I've never done a diff between them so I don't know.
I think it'd be easier to just run a b5 set on linux and use that as the base line to start from rather than using the mac/windows reference set from b4 to begin with.
All the minotaur script files are in: http://mxr.mozilla.org/mozilla/source/testing/release/minotaur/
There are three different entry points you can use to run minotaur:
* Partner.py - has some specific defaults for partner builds to make running those easier (download areas, password protected sites, etc) - does download/install for you
* run-minotaur.py runs once per locale. Usually used to make testing locales easier does download/install for you.
* minotaur.sh - this is the core, lowest level script that runs minotaur, everything else is a wrapper for this script. You already have to have your build installed in order to run this one.
Also:
* mozDownload and mozInstall are some cross platform python utilities to download and install builds.

Hmm...I think that keeping a deep enough regression range for these is important, so we want to keep the nightly results files around for a while. The question is how long is a while? The other consideration is that we could zip these up, since they are all text, they compress very nicely.
I'm not sure if we want to keep checking them into the CVS repo in an expanded tree. That might get unruly and huge pretty quickly.
Axel -- what do you think?