in the last weeks I've setup a new server that should replace our very outdated wpt/osm instance. This new server contains an instance of webpagetest (newest private instance, running on Debian Stretch) as well as two agents, capsuled in Windows 7 kvms. On top is a docker instance of open speed monitor that controls and receives results from the wpt.

Basically everything runs fine and while monitoring the whole system(s) through nagios I can see that the load is very well balanced, means there is no system (neither Linux host nor kvms) that gets an overload or out of ressources.

However, there is one thing I can't explain to myself: I noticed that (even though the new kvms perform significantly better in terms of reaction time than the very old wpt system where the agents ran on the same windows system where the private instance runs) it takes about 30 minutes longer for an agent to complete all the tests (about 40) than on the old system. When looking at the agents I saw that this is the result of a comparably long phase of the wpt-driver saying "Processing Result" after a browser is finished. But in this time, the agent idles or at least it seems so - no noticably cpu or memory consumption, no heavy network... It stays there for about 40 seconds to a few minutes before starting the next job.

So my question is: What exactly is happening at this stage? Communication with the private instance? On which site does the work happen here? I weren't able do find anything in the docs or any option to eneble more logging.

Most of that would likely be in processing any Chrome trace files which should show a lot of activity (may depend on I/O performance of the VM).

There should be a test_timing.log that gets uploaded with each test result which would have better granular details.

One possible cause is if python is installed so it can run the trace parsing but ujson isn't so it is using a slow json parser it could take longer than needed. If the slow bit is in the trace parsing try "pip install ujson" and see if it helps.

thank you for your answer. Today I was able to invest some time with your hints on this - but unfortunately no luck:

* I wasn't able to locate a file named test_timing.log (also grepped for this everywehere, just to be sure). I can only find test.log in the results folder, which shows (this is already surprising to me) that the post processing doesn't waste time. See log at the end of this post, according to the log post processing is finished immediately and this is the last line of the log.
* Anyway, I tried your suggestion because I did not have the ujson module installed (but python was), so I did install and test it. Module works without errors, but no change in processing time - still get "Test complete - processing result" in between tests.

Now I got the logs, esp. test_timing.log. Unfortunately, it doesn't strike me - I guess the values are milliseconds (would make sense when viewing at 'Run Test' and 'Measure Step'). That said, it looks OK to me? All other values are in a relatively low range then, nothing that hints on that >=30s delay on most tests.

The python processing if it were to happen (which it doesn't look like it is because there is no trace processing time) would be included in the "Upload Images" time so the problem isn't there.

The UI doesn't update to another message until it has actually uploaded the result, waited for the browser to exit and moved on to requesting the next test (and some of those are after the upload so can't be included in the timing log).

The only bits that make sense for possibly taking a long time are:

1 - workdone.php call to the server with the result, if the server is extremely slow to process the result or for some reason hangs the request

2 - The wait for the browser to gracefully exit waits for 10 seconds before force-killing the process.

I may be able to add more status messages to help track down which of those is causing the issue but it may be better to set up a debug build which has a lot more logging (through requires running dbgview to capture the messages).

Browser exits properly after every test, at least I confirmed by watching the taskmgr - the window closes as soon as wptdriver switches to "test complete - processing result" and there remain no browser processes, they disappear immediately (only using FF actually).

I guess the least time-consuming step is switching to nginx just for a test. Should be pretty easy, after I saw your nginx.conf for including. This way I can see if this is somehow webserver-related, because I still don't see any heavy load on the server side. Actually I'm using latest Apache on Debian Stretch.

I'll post an update this week if the webserver-switch made a difference when I managed to test it.