Basic sanity testing around core phone features needs to happen per commit on a physical device that would otherwise be missed during b2g desktop and emulator testing.
Discussions have taken place already and some notes can be found at:
https://etherpad.mozilla.org/10minutesuite

A basic set of tests have been collected and added to a job in Jenkins:
flame.b2g-inbound.ui.functional.sanity.[restart|no_restart]
We decided to create jobs for both restart and no_restart to determine if it's necessary to restart the phone in between test runs. Restarting the phone adds about 2 minutes to the total run time with just the basic set of tests.

Good point, I didn't think about that. Still learning all the nuances. I wonder if there is a way that we could determine if there was a crash and only reboot if necessary.
That also reminds me of a conversation I was having with Stephen on how we could recover from crashes without cleanup_data() blowing away the Crash Reports when setting up the next test. Seems like chasing crash reports is a hit or miss type thing.

I can't be sure but I think it will still recover insomuch as it will wait for the marionette server to be ready again. The only thing I am not sure about is whether the existing setUp will 'kill' the crash recovery prompt or not. It would depend on the nature of the crash.

Clint, James, can you clarify the definition of the 10 minute of time?
Does the 10 minutes include flashing/configuring the environment? It takes around 4.5 minutes but can easily be twice that if checking out the git repo is slow.
I think the 10 minutes should exclude environment setup. At the moment the 5 tests it runs are not representative enough of a sanity test, IMO. With 10 minutes of just test runtime we could run 2x as many tests.

(In reply to Zac C (:zac) from comment #6)
> Clint, James, can you clarify the definition of the 10 minute of time?
>
> Does the 10 minutes include flashing/configuring the environment? It takes
> around 4.5 minutes but can easily be twice that if checking out the git repo
> is slow.
>
> I think the 10 minutes should exclude environment setup. At the moment the 5
> tests it runs are not representative enough of a sanity test, IMO. With 10
> minutes of just test runtime we could run 2x as many tests.
The Flame builds are happening in 39 minutes. So, if we have near constant landings, in order for this test to keep up with the backlog is that it needs to run at least once every 39 minutes.
And, build times continue to improve, so you'd best not try to keep up by running a test job at 38 minutes. So, while there is some obvious overkill in the number, I think the 10 minutes needs to include everything. That way you have the most flexibility and you can perform a quick job (one that devs might even run locally) just to see if core functionality on the phone is broken. Honestly though, now that all the world is shifted, it's no longer my call. I direct you to James.
Dave, the job you have running and uploading to Treeherder looks great! The only thing I think that is missing is that Treeherder is not finding the test log. I'm not sure if that's because it isn't getting uploaded, or if there is a bug treeherder side. But, in the detail pane output there is no "log" icon. So that might be something good to chase down. Follow up with Jeads/Mauro/Camd if you think there's a bug Treeherder side.

>I think the 10 minutes should exclude environment setup. At the moment the 5 tests it runs are not representative enough of a sanity test, IMO. With 10 minutes of just test runtime we could run 2x as many tests.
TLDR; 10 minutes is the goal (in my mind) and higher is acceptable up to the highest number of another test suite run (assuming we have enough phones!)
In my mind the 10 minute number should be the entire end-to-end number inclusive of setup... The war will be on setup and build times as we bring more phones online and begin running multiple "chunks" in parallel...
One thing to keep in mind is the 10 minute number would actually be a very fast test right now often times a single chunk takes between 10-30 minutes so as long as we have enough phones to keep up with the pace of b2g-inbound slower is okay initially... I would like to see more data once we are up and running and have a good sense of what our failure (or intermittent) rate looks like.
In the near future I am bringing someone onboard (1 month~) to optimize our phone builds like crazy... we are doing many suboptimal things there outside of the ongoing optimization of gecko build times that we can improve upon ... Some what pessimistically 20 minute builds are in the near future and lower is certainly possible.

With the current 10 minute suite, the device test results are arriving faster into treeherder than the linux64 and macos desktopb2g test results!
We have the capacity to add 5 minutes more of tests now.

(In reply to Clint Talbert ( :ctalbert ) from comment #8)
> Dave, the job you have running and uploading to Treeherder looks great! The
> only thing I think that is missing is that Treeherder is not finding the
> test log. I'm not sure if that's because it isn't getting uploaded, or if
> there is a bug treeherder side.
It's not currently being uploaded, see bug 1049723.

Hey Greg, do you want to 'upgrade' this to a pull request?
Also we should omit the activesync tests entirely because of the ongoing bug 1028192 issue and because they're very long running, hence expensive. We could get two smaller tests for one email test.

After a suggestion by someone (I think :davehunt), I am just specifying the tests specifically in the job rather than altering the manifest. Of course the more we add to this, the more unwieldy it becomes and there wouldn't be an easy way to run the sanity tests locally without duplicating the command.
For instance:
gaiatest [...options] gaiatest/tests/functional/browser/test_browser_lan.py gaiatest/tests/functional/camera/test_camera_capture_photo.py gaiatest/tests/functional/dialer/test_dialer_airplane_mode.py gaiatest/tests/functional/lockscreen/test_lockscreen_unlock_to_homescreen.py gaiatest/tests/functional/messages/test_sms.py
Are we in agreement to just use the "sanity" flag in the manifests for specifying tests and just have it point to the root manifest? I can do that, no problem. We could also review it and see if there are more tests we would like to add.

Comment on attachment 8475148[details][review]
Github PR: #23037
In camera.ini, the new tag splits the comment and the tag below it so we just need to move the sanity=true to line 14.
Also I think we should have just 1 settings test in there and we can spare it considering that we've dropped the email test.
I think test_settings_bluetooth would be perfect because it encompasses a piece of hardware and the settings app in one go.