To evaluate the real value added by the "consensus/MQAP" based methods or human groups, I think it is better to compare them with a simple clustering method instead of the best individual servers.For example, we can have a simple clustering method by using 3D-Jury to rank all the server models.A "consensus/MQAP" based method or human group adds real value to modeling if and only if it performs (statistically) significantly better than such a simple clustering method.

Pcons is a naive clustering method (basically identical to 3d-jury). It is clear that Pcomb (that is the method used in Elofsson) performs better in any measure that we tried. In earlier CASPs earlier versions of Pcomb did not perform better than the best individual server.

So perhaps we had some progress. On the other hand I think there are other MQAPs that perform at least as well as Pcomb

Pcons is a naive clustering method (basically identical to 3d-jury). It is clear that Pcomb (that is the method used in Elofsson) performs better in any measure that we tried. In earlier CASPs earlier versions of Pcomb did not perform better than the best individual server.

So perhaps we had some progress. On the other hand I think there are other MQAPs that perform at least as well as Pcomb

The performance of a consensus method depends on the performance of individual servers and also their correlation. If the individual servers are highly correlated or the best individual server is significantly better than the others, it is possible that a simple clustering method may not perform better than the best individual server. However, when there are a few pretty good and independent individual servers, a simple clustering method may perform better than the best individual server. Therefore, I think 3D-Jury instead of the best individual server may be a better control for benchmarking the "consensus/MQAP" methods.