Parallel Algorithms for Select and Partition with Noisy Comparisons

Mark Braverman
Department of Computer Science, Princeton University, email: mbraverm@cs.princeton.edu. Research supported in part by an NSF CAREER award (CCF-1149888), NSF CCF-1215990, NSF CCF-1525342, a Packard Fellowship in Science and Engineering, and the Simons Collaboration on Algorithms and Geometry.Jieming Mao
Department of Computer Science, Princeton University, email: jiemingm@cs.princeton.edu.S. Matthew Weinberg
Department of Computer Science, Princeton University, email: sethmw@cs.princeton.edu. Research completed in part while the author was a Microsoft Research Fellow at the Simons Institute for the Theory of Computing.

Abstract

We consider the problem of finding the kth highest element in a totally ordered set of n elements (Select), and partitioning a totally ordered set into the top k and bottom n−k elements (Partition) using pairwise comparisons. Motivated by settings like peer grading or crowdsourcing, where multiple rounds of interaction are costly and queried comparisons may be inconsistent with the ground truth, we evaluate algorithms based both on their total runtime and the number of interactive rounds in three comparison models: noiseless (where the comparisons are correct), erasure (where comparisons are erased with probability 1−γ), and noisy (where comparisons are correct with probability 1/2+γ/2 and incorrect otherwise). We provide numerous matching upper and lower bounds in all three models. Even our results in the noiseless model, which is quite well-studied in the TCS literature on parallel algorithms, are novel.

Rank aggregation is a fundamental problem with numerous important applications, ranging from well-studied settings such as social choice [CN91] and web search [DKNS01] to newer platforms such as crowdsourcing [CBCTH13] and peer grading [PHC+13]. Salient common features among these applications is that in the end, ordinal rather than cardinal information about the elements is relevant, and a precise fine-grained ordering of the elements is often unnecessary. For example, the goal of social choice is to select the best alternative, regardless of how good it is. In a curved course, the goal of peer grading is to partition assignments into quantiles corresponding to A/B/C/D, etc, regardless of their absolute quality.

Prior work has produced numerous ordinal aggregation procedures (i.e. based on comparisons of elements rather than cardinal evaluations of individual elements) in different settings, and we overview those most relevant to our work in Section 1.1. However, existing models from this literature fail to capture an important aspect of the problem with respect to some of the newer applications; that multiple rounds of interaction are costly. In crowdsourcing, for instance, one round of interaction is the time it takes to send out a bunch of tasks to users and wait for their responses before deciding which tasks to send out next, which is the main computational bottleneck. In peer grading, each round of interaction might take a week, and grades are expected to be determined certainly within a few weeks. In conference decisions, even one round of interaction seems to be pushing the time constraints.

Fortunately, the TCS community already provides a vast literature of algorithms with this constraint in mind, under the name of parallel algorithms. For instance, previous work resolves questions like “how many interactive rounds are necessary for a deterministic or randomized algorithm to select the kth element with O(n) total comparisons?” [Val75, Rei81, AKSS86, AA88a, AA88b, BB90]. This line of research, however, misses a different important aspect related to these applications (that is, in fact, captured by most works in rank aggregation), that the comparisons might be erroneous. Motivated by applications such as crowdsourcing and peer grading, we therefore study the round complexity of Partition, the problem of partitioning a totally ordered set into the top k and bottom n−k elements, when comparisons might be erroneous.

Our first results on this front provide matching upper and lower bounds on what is achievable for Partition in just one round in three different models of error: noiseless (where the comparisons are correct), erasure (where comparisons are erased with probability 1−γ), and noisy (where comparisons are correct with probability 1/2+γ/2 and incorrect otherwise). We provide one-round algorithms using dn comparisons that make O(n/d),O(n/(dγ)), and O(n/(dγ2)) mistakes (a mistake is any element placed on the wrong side of the partition) with high probability in the three models, respectively. The algorithms are randomized and different for each model, and the bounds hold both when d is an absolute constant or a function of n and γ. We provide asymptotically matching lower bounds as well: all (potentially randomized) one-round algorithms using dn comparisons necessarily make Ω(n/d),Ω(n/(dγ)), and Ω(n/(dγ2)) mistakes in expectation in the three models, respectively. We further show that the same algorithms and lower bound constructions are also optimal (up to absolute constant factors) if mistakes are instead weighted by various different measures of their distance to k, the cutoff.1

After understanding completely the tradeoff between the number of comparisons and mistakes for one-round algorithms in each of the three models, we turn our attention to multi-round algorithms. Here, the results are more complex and can’t be summarized in a few sentences. We briefly overview our multi-round results in each of the three models below. Again, all of the upper and lower bounds discussed below extend when mistakes are weighted by their distance to the cutoff. We overview the techniques used in proving our results in Section 1.2, but just briefly note here that the level of technicality roughly increases as we go from the noiseless to erasure to noisy models. In particular, lower bounds in the noisy model are quite involved.

Multi-Round Results in the Noiseless Model.

We design a 2-round algorithm for Partition using n/ε total comparisons that makes O(n1/2+εpoly(logn))) mistakes with probability 1−e−Ω(n), and prove a nearly matching lower bound of Ω(√n⋅ε5/2) mistakes, for any ε>0 (ε may be a constant or a function of n).

We design a 3-round algorithm for Partition making O(n⋅poly(logn)) total comparisons that makes zero mistakes with probability 1−e−Ω(n). It is known that ω(n) total comparisons are necessary for a 3-round algorithm just to solve Select, the problem of finding the kth element, with probability 1−o(1)[BB90].

We design a 4-round algorithm for Partition making O(n) total comparisons that makes zero mistakes with probability 1−e−Ω(n). This matches the guarantee provided by an algorithm of Bollobás and Brightwell for Select, but is significantly simpler (in particular, it avoids any graph theory) [BB90].

Multi-Round Results in the Erasure Model.

We design a O(log∗(n))-round algorithm for Partition making O(n/γ) total comparisons that makes zero mistakes with probability 1−e−Ω(n).

We show that no o(log∗(n))-round algorithm even for Select making O(n/γ) total comparisons can succeed with probability 2/3.

Multi-Round Results in the Noisy Model.

We design a 4-round algorithm for Partition making O(nlogn/γ2) comparisons that makes zero mistakes with high probability (a trivial corollary of our noiseless algorithm).

We show that no algorithm even for Select making o(nlogn/γ2) comparisons can succeed with probability 2/3 (in any number of rounds).

We design an algorithm for findMin (the special case of Select with k=n) making O(n/γ2) comparisons that succeeds with probability 2/3. We also show that no algorithm making o(nlogn/γ2) comparisons can solve findMin with probability 1−1/poly(n) (in any number of rounds).

Together, these results tell an interesting story. In one round, one can obtain the same guarantee in the noiseless versus erasure model with an additional factor of 1/γ comparisons. And one can obtain the same guarantee in the erasure versus noisy model with an additional factor of 1/γ comparisons. In some sense, this should be expected, because this exactly captures the degradation in information provided by a single comparison in each of the three models (a noiseless comparison provides one bit of information, an erasure comparison provides γ bits of information, and a noisy comparison provides Θ(γ2) bits of information). But in multiple rounds, everything changes. In four rounds, one can perfectly partition with high probability and O(n) total comparisons in the noiseless model. In the erasure model, one can indeed partition perfectly with high probability and O(n/γ) comparisons, but now it requires Θ(log∗(n)) rounds instead of just 4. Moreover, in the noisy model, any algorithm even solving Select with probability 2/3 requires an Ω(logn/γ) blow-up in the number of comparisons, in any number of rounds! Note that neither of these additional factors come from the desire to succeed with high probability (as the lower bounds hold against even a 2/3 success) nor the desire to partition every element correctly (as the lower bounds hold even for just Select), but just from the way in which interaction helps in the three different models.

While we believe that the story told by our work as a whole provides the “main result,” it is also worth emphasizing independently our results in the noisy model. Our one-round algorithm, for instance, is more involved than its counterparts in the noiseless and erasure models and our analysis uses the theory of biased random walks. Our multi-round lower bounds against Select and findMin in the noisy model are the most technical results of the paper, and tell their own interesting story about the difference between findMin and Select in the noisy model. To our knowledge, most tight lower bounds known for Select come directly from lower bounding findMin. It’s surprising that findMin requires Θ(logn) fewer comparisons than Select to solve with probability 2/3 in the noisy model.

We proceed now by discussing some related works below, and briefly overviewing our techniques in Section 1.2. We provide some conclusions and future directions in Section 1.3. Our single-round results are discussed in Section 3 and our multi-round results are discussed in Section 4. However, due to space constraints, all proofs are deferred to the appendix.

1.1 Related Work

Rank aggregation is an enormous field that we can’t possibly summarize in its entirety here. Some of the works most related to ours also study Partition (sometimes called Top-K). Almost all of these works also consider the possibility of erroneous comparisons, although sometimes under different models where the likelihood of an erroneous comparison scales with the distance between the two compared elements [CS15, BSC+13, Eri13]. More importantly, to our knowledge this line of work either considers settings where the comparisons are exogenous (the designer has no control over which comparisons are queried, she can just analyze the results), or only analyze the query complexity and not the round complexity of designed algorithms. Our results contribute to this line of work by providing algorithms designed for settings like crowdsourcing or peer grading where the designer does have design freedom, but may be constrained by the number of interactive rounds.

There is a vast literature from the parallel algorithms community studying various sorting and selection problems in the noiseless model. For instance, tight bounds are known on the round complexity of Select for deterministic algorithms using O(n) total comparisons (it is Θ(loglogn)) [Val75, AKSS86], and randomized algorithms using O(n) total comparisons (it is 4) [AA88b, AA88a, Rei81, BB90]. Similar results are known for sorting and approximate sorting as well [Col88, AAV86, AKS83, HH81, BT83, BH85, Lei84]. Many of the designed deterministic algorithms provide sorting networks. A sorting network on n elements is a circuit whose gates are binary comparators. The depth of a sorting network is the number of required rounds, and the number of gates is the total number of comparisons. Randomized algorithms are known to require fewer rounds than deterministic ones with the same number of total comparisons for both sorting and selecting [AA88a, BB90].

In the noisy model, one can of course take any noiseless algorithm and repeat every comparison O(logn/δ2) times in parallel. To our knowledge, positive results that avoid this simple repetition are virtually non-existent. This is likely because a lower bound of Leighton and Ma [LM00] proves that in fact no sorting network can provide an asymptotic improvement (for complete sorting), and our lower bound (Theorem 11) shows that no randomized algorithm can provide an asymptotic improvement for Select. To our knowledge, no prior work studies parallel sorting algorithms in the erasure model. On this front, our work contributes by addressing some open problems in the parallel algorithms literature, but more importantly by providing the first parallel algorithms and lower bounds for Select in the erasure and noisy models.

There is also an active study of sorting in the noisy model [BM08, BM09, MMV13] within the TCS community without concern for parallelization, but with concern for resampling. An algorithm is said to resample if it makes the same comparison multiple times. Clearly, an algorithm that doesn’t resample can’t possibly find the median exactly in the noisy model (what if the comparison between n/2 and n/2+1 is corrupted?). The focus of these works is designing poly-time algorithms to find the maximum-likelihood ordering from a set of (n2) noisy comparisons. Our work is fundamentally different from these, as we have asymptotically fewer than (n2) comparisons to work with, and at no point do we try to find a maximum-likelihood ordering (because we only want to solve Partition).

1.2 Tools and Techniques

Single Round Algorithms and Lower Bounds. Our single round results are guided by the following surprisingly useful observation: in order for an algorithm to possibly know that i exceeds the kth highest element, i must at least be compared to some element between itself and k (as otherwise, the comparison results would be identical if we replaced i with an element just below k). Unsurprisingly, it is difficult to guarantee that many elements within n/d of k are compared to elements between themselves and k using only dn total comparisons in a single round, and this forms the basis for our lower bounds. Our upper bounds make use of this observation as well, and basically are able to guarantee that an element is correctly placed with high probability whenever it is compared to an element between itself and k. It’s interesting that the same intuition is key to both the upper and lower bounds. We provide a description of the algorithms and proofs in Section 3.

In the erasure model, the same intuition extends, except that in order to have a non-erased comparison between i and an element between i and k, we need to make roughly 1/γ such comparisons. This causes our lower bounds to improve by a factor of 1/γ. In the noisy model, the same intuition again extends, although this time the right language is that we need to learn Ω(1) bits of information from comparisons of i to elements between i and k, which requires Ω(1/γ2) such comparisons, and causes the improved factor of 1/γ2 in our lower bounds. Our algorithms in these two models are similar to the noiseless algorithm, but the analysis becomes necessarily more involved. For instance, our analysis in the noisy model appeals to facts about biased random walks on the line.

Multi-Round Algorithms and Lower Bounds. Our constant-round algorithms in the noiseless model are based on the following intuition: once we reach the point that we are only uncertain about o(n) elements, we are basically looking at a fresh instance of Partition on a significantly smaller input size, except we’re still allowed Θ(n) comparisons per round. Once we’re only uncertain about only O(√n) elements, one additional round suffices to finish up (by comparing each element to every other one). The challenge in obtaining a four-round algorithm (as opposed to just an O(1)-round algorithm) is ensuring that we make significant enough gains in the first three rounds.

Interestingly, these ideas for constant-round algorithms in the noiseless model don’t prove useful in the erasure or noisy models. Essentially the issue is that even after a constant number of rounds, we are unlikely to be confident that many elements are above or below k, so we can’t simply recurse on a smaller instance. Still, it is quite difficult to discover a formal barrier, so our multi-round lower bounds for the erasure and noisy models are quite involved. We refer the reader to Section 4 for further details.

1.3 Conclusions

We study the problems of Partition and Select in settings where interaction is costly in the noiseless, erasure, and noisy comparison models. We provide matching (up to absolute constant factors) upper and lower bounds for one round algorithms in all three models, which also show that the number of comparisons required for the same guarantee degrade proportional to the information provided by a single comparison. We also provide matching upper and lower bounds for multi-round algorithms in all three models, which also show that the round and query complexity required for the same guarantee in these settings degrades worse than just by the loss in information when moving between the three comparison models. Finally, we show a separation between findMin and Select in the noisy model.

We believe our work motivates two important directions for future work. First, our work considers some of the more important constraints imposed on rank aggregation algorithms in applications like crowdsourcing or peer grading, but not all. For instance, some settings might require that every submission receives the same amount of attention (i.e. is a member of the same number of comparisons), or might motivate a different model of error (perhaps where mistakes aren’t independent or identical across comparisons). It would be interesting to design algorithms and prove lower bounds under additional restrictions motivated by applications.

Finally, it is important to consider incentives in these applications. In peer grading, for instance, the students themselves are the ones providing the comparisons. An improperly designed algorithm might provide “mechanism design-type” incentives for the students to actively misreport if they think it will boost their own grade. Additionally, there are also “scoring rule-type” incentives that come into play: grading assignments takes effort! Without proper incentives, students may choose to put zero or little effort into their grading and just provide random information. We believe that using ordinal instead of cardinal information will be especially helpful on this front, as it is much easier to design mechanisms when players just make binary decisions, and it’s much easier to understand how the noisy information provided by students scale with effort (in our models, it is simply that γ will increase with effort). It is therefore important to design mechanisms for applications like peer grading by building off of our algorithms.

In this work, we study two problems, Select and Partition. Both problems take as input a randomly sorted, totally ordered set and an integer k. For simplicity of notation, we denote the ith smallest element of the set as i. So if the input set is of size n, the input is exactly [n]. In Select, the goal is to output the (location of the) element k. In Partition, the goal is to partition the elements into the top k, which we’ll call A for Accept and the bottom n−k, which we’ll call R for Reject. Also for ease of notation, we’ll state all of our results for k=n/2, the median, w.l.o.g.2

We say an algorithm solves Select if it outputs the median, and solves Partition if it places correctly all elements above and below the median. For Select, we will say that an algorithm is a t-approximation with probability p if it outputs an element in [n/2−t,n/2+t] with probability at least p. For Partition, we will consider a class of success measures, parameterized by a constant c, and say the c-weighted error associated with a specific partitioning into A⊔R is equal to ∑i>n/2I(i∈R)ic+∑i<n/2I(i∈A)ic.3 Interestingly, in all cases we study, the same algorithm is asymptotically optimal for all c.

Query and Round Complexity. Our algorithms will be comparison-based. We study both the number of queries, and the number of adaptive rounds necessary to achieve a certain guarantee.4 We may not always emphasize the runtime of our algorithms, but they all run in time poly(n).

Notation. We always consider settings where the input elements are a priori indistinguishable, or alternatively, that our algorithms randomly permute the input before making comparisons. When we write x<y, we mean literally that x<y in the ground truth. In the noisy model, the results of comparisons may disagree with the underlying ordering, so we say that x beats y if a noisy comparison of x and y returned x as larger than y (regardless of whether or not x>y).

Models of Noise. We consider three comparison models, which return the following when a>b.

Noiseless: Returns a beats b.

Erasure: Returns a beats b with probability γ, and ⊥ with probability 1−γ.

Noisy: Returns a beats b with probability 1/2+γ/2, and b beats a with probability 1/2−γ/2.

Partition versus Select. We design all of our algorithms for Partition, and prove all of our lower bounds against Select. We do this because Select is in some sense a strictly easier problem than Partition. We discuss how one can get algorithms for Select via algorithms for Partition and vice versa formally in Appendix A.

Resampling. Finally, note that in the erasure and noisy models, it may be desireable to query the same comparison multiple times. This is called resampling. It is easy to see that without resampling, it is impossible to guarantee that the exact median is found with high probability, even when all (n2) comparisons are made (what if the comparison between n/2 and n/2+1 is corrupted?). Resampling is not necessarily undesireable in the applications that motivate this work, so we consider our main results to be in the model where resampling is allowed. Still, it turns out that all of our algorithms can be easily modified to avoid resampling at the (necessary) cost of a small additional error, and it is easy to see the required modifications.5 All of our lower bounds hold even against algorithms that resample.

In this section, we provide our results on non-adaptive (round complexity = 1) algorithms. We begin with the upper bounds below, followed by our matching (up to constant factors) lower bounds.

3.1 Upper Bounds

We provide asymptotically optimal algorithms in each of the three comparison models. Our three algorithms actually choose the same comparisons to make, but determine whether or not to accept or reject an element based on the resulting comparisons differently. The algorithms pick a skeleton setS of size √n and compare every element in S to every other element. Each element not in S is compared to d−1 random elements of S. Pseudocode for this procedure is given in Appendix B.

From here, the remaining task in all three models is similar: the algorithm must first estimate the rank of each element in the skeleton set. Then, for each i, it must use this information combined with the results of d−1 comparisons to guess whether i should be accepted or rejected. The correct approach differs in the three models, which we discuss next.

Noiseless Model. Pseudocode for our algorithm in the noiseless model is provided as Algorithm 2 in Appendix B. First, we estimate that the median of the skeleton set, x, is close to the actual median. Then, we hope that each i∉S is compared to some element in S between itself and x. If this happens, we can pretty confidently accept or reject i. If it doesn’t, then all we learn is that i is beaten by some elements above x and it beats some elements below x, which provides no helpful information about whether i is above or below the median, so we just make a random decision.

Theorem 1.

Algorithm 2 has query complexity dn, round complexity 1, does not resample, and outputs a partition that, for all c, has:

expected c-weighted error O((n/d)c+1), for any d=o(n1/4)

c-weighted error O((n/d)c+1) with probability 1−e−Ω(n3/d2c+2), for any d=o(n1/4).

We provide a complete proof of Theorem 1 in Appendix B. The main ideas are the following. There are two sources of potential error in Algorithm 2. First, maybe the skeleton set is poorly chosen and not representative of the ground set. But this is extremely unlikely with such a large skeleton set. Second, note that if i is compared to any element in S between itself and x, andx is very close to n/2, then i will be correctly placed. If |i−n/2|>n/d, then we’re unlikely to miss this window on d−1 independent tries, and i will be correctly placed.

Erasure Model. In the erasure model, pseudocode for the complete algorithm we use is Algorithm 3 in Appendix B. At a high level, the algorithm is similar to Algorithm 2 for the noiseless model, so we refer the reader to Appendix B to see the necessary changes.

Theorem 2.

Algorithm 3 has query complexity dn, round complexity 1, does not resample, and outputs a partition that, for all c, has:

expected c-weighted error O((n/(dγ))c+1), for any d,γ such that d/γ=o(n1/4)

We again postpone a complete proof of Theorem 2 to Appendix B. The additional ingredient beyond the noiseless case is a proof that with high probability, not too many of the comparisons within S are erased and therefore while we can’t learn the median of S exactly, we can learn a set of almost |S|/2 elements that are certainly above the median, and almost |S|/2 elements that are certainly below. If i∉S beats an element that is certainly above the median of S, we can confidently accept it, just like in the noiseless case.

Noisy Model. Pseudocode for our algorithm in the noisy model is provided as Algorithm 4 in Appendix B. Algorithm 4 is necessarily more involved than the previous two. We can still recover a good ranking of the elements in the skeleton set using the Braverman-Mossel algorithm [BM08], so this isn’t the issue. The big difference between the noisy model and the previous two is that no single comparison can guarantee that i∉S should be accepted or rejected. Instead, every time we have a set of elements all above the median of S, x, of which i beats at least half, this provides some evidence that i should be accepted. Every time we have a set of elements all below x of which i is beaten by at least half, this provides some evidence that i should be rejected. The trick is now just deciding which evidence is stronger. Due to space constraints, we refer the reader to Algorithm 4 to see our algorithm, which we analyze using theory from biased random walks on the line.

Theorem 3.

Algorithm 4 has query complexity dn, round complexity 1, does not resample, and outputs a partition that, for all c, has:

c-weighted error O((n/(dγ2))c+1) with probability 1−eΩ(n3/d2c+2), for any d=o(n1/4), γ=ω(n1/8).

3.2 Lower Bounds

In this section, we show that the algorithms designed in the previous section are optimal up to constant factors. All of the algorithms in the previous section are “tight,” in the sense that we expect element i to be correctly placed whenever it is compared to enough elements between itself and the median. In the noiseless model, one element is enough. In the erasure model, we instead need Ω(1/γ) (to make sure at least one isn’t erased). In the noisy model, we need Ω(1/γ2) (to make sure we get Ω(1) bits of information about the difference between i and the median). If we don’t have enough comparisons between i and elements between itself and the median, we shouldn’t hope to be able to classify i correctly, as the comparisons involving i would look nearly identical if we replaced i with an element just on the other side of the median. Our lower bounds capture this intuition formally, and are all proved in Appendix B.

Theorem 4.

For all c, d>0, any non-adaptive algorithm with query complexity dn necessarily has expected c-weighted error Ω((n/d)c+1) in the noiseless model, Ω((n/(dγ))c+1) in the erasure model, and Ω((n/(dγ2))c+1) in the noisy model.

4.1 Noiseless Model

We first present our algorithm and nearly matching lower bound for 2-round algorithms. The first round of our algorithm tries to get as good of an approximation to the median as possible, and then compares it to every element in round two. Getting the best possible approximation is actually a bit tricky. For instance, simply finding the median of a skeleton set of size √n only guarantees an element within Θ(n3/4) of the median.6 We instead take several “iterations” of nested skeleton sets to get a better and better approximation to the median. In reality, all iterations happen simultaneously in the first round, but it is helpful to think of them as sequential refinements.

For any r≥1, our algorithm starts with a huge skeleton set S1 of n2r/(2r+1) random samples from [n]. This is too large to compare every element in S1 with itself, so we choose a set T1⊆S1 of n1/(2r+1) random pivots. Then we compare every element in S1 to every element in T1, and we will certainly learn two pivots, a1 and b1 such that the median of S1 lies in [a1,b1], and a p1 such that the median of S1 is exactly the (p1|A1|)th element of A1=S1∩[a1,b1]. Now, we recurse within A1 and try to find the (p1|A1|)th element. Of course, because all of these comparisons happen in one round, we don’t know ahead of time in which subinterval of S1 we’ll want to recurse, so we have to waste a bunch of comparisons. These continual refinements still make some progress, and allow us to find a smaller and smaller window containing the median of S1, which is a very good approximation to the true median because S1 was so large. Pseudocode for our algorithm is Algorithm 5 in Appendix C, which “recursively” tries to find the (pi|Ai|)th element of Ai.

Theorem 5.

For all c,r and ε>0, Algorithm 5 has round complexity 2, query complexity (r+1)n, and outputs a partition that:

has expected c-weighted error at most (8rn(r+1)/(2r+1)+ε)c+1

has c-weighted error at most (8rn(r+1)/(2r+1)+ε)c+1 with probability at least 1−re−nΩ(ε).

Note that setting r=logn, and ε such that nε=8log3n, we get an algorithm with round complexity 2, query complexity nlogn+n that outputs a partition with c-weighted error O((√nlog4n)c+1) with probability 1−O(logn/n2).

We also prove a nearly matching lower bound on two-round algorithms in the noiseless model. At a very high level, our lower bound repeats the argument of our one round lower bound twice. Specifically, we show that after one round, there are many elements within a window of size Θ(n/d) of the median such that a constant fraction of these elements have not been compared to any other elements in this window. We then show that after the second round, conditioned on this, there is necessarily a window of size ≈√n such that a constant fraction of these elements have not been compared to any other elements in this window. Finally we show that this implies that we must err on a constant fraction of these elements. The actual proof is technical, but follows this high level outline. Proofs of Theorems 5 and 6 can be found in Appendix C.

Theorem 6.

For all c, and any d=o(n1/5), any algorithm with query complexity dn and round complexity 2 necessarily has expected c-weighted error Ω((√n/d5/2)c+1).

From here we show how to make use of our two-round algorithm to design a three-round algorithm that makes zero mistakes with high probability. After our two-round algorithm with appropriate parameters, we can be pretty sure that the median lies somewhere in a range of O(√nlog4n), so we can just compare all of these elements to each other in one additional round. Pseudocode for Algorithm 6 is in Appendix C.

Theorem 7.

For all c, Algorithm 6 has query complexity O(nlog8n), round complexity 3, and outputs a partition with zeroc-weighted error with probability 1−O(logn/n2).

Again, recall that ω(n) queries are necessary for any three-round algorithm just to solve Select with probability 1−o(1)[BB90]. Finally, we further make use of ideas from our two-round algorithm to design a simple four round algorithm that has query complexity O(n) and makes zero mistakes with high probability. More specifically, we appropriately tune the parameters for our two-round algorithm (i.e. set r=1) to find a window of size ≈n2/3 that contains the median (and already correctly partition all other elements). We then use similar ideas in round three to further find a window of size ≈√n that contains the median (and again correctly partition all other elements). We use the final round to compare all remaining uncertain elements to each other and correctly partition them.

Theorem 8.

For all c, and any ε∈(0,1/18), Algorithm 7 has query complexity O(n), round complexity 4, and outputs a partition with zeroc-weighted error with probability at least 1−e−Ω(nε).

4.2 Erasure and Noisy Models

Here we briefly overview our results on multi-round algorithms in the erasure and noisy models. We begin with an easy reduction from these models to the noiseless model, at the cost of a blow-up in the round or query complexity. Essentially, we are just observing that one can adaptively resample any comparison in the erasure model until it isn’t erased (which will take 1/γ resamples in expectation), and also that one can resample in parallel any comparison in either the erasure or noisy model the appropriate number of times and have it effectively be a noiseless comparison.

Proposition 1.

If there is an algorithm solving Partition, Select or findMin in the noiseless model with probability p that has query complexity Q and round complexity r, then there are also algorithms that resample that:

solve Partition, Select or findMin in the erasure model with probability p that have expected query complexity Q/γ, but perhaps with expected round complexity Q/γ as well.

solve Partition, Select or findMin in the erasure model with probability p−1/poly(n) that have query complexity O(Q(logQ+logn)/γ), and round complexity r.

solve Partition, Select or findMin in the noisy model with probability p−1/poly(n) that have query complexity O(Q(logQ+logn)/γ2), and round complexity r.

Corollary 1.

There are algorithms that resample that:

solve Partition or Select in the erasure model with probability 1 with expected query complexity O(n/γ) (based on the QuickSelect or Median-of-Medians algorithm [Hoa61, BFP+73]).

solve Partition or Select in the erasure model with probability 1−1/poly(n) with query complexity O(nlogn/γ) and round complexity 4.

solve Partition or Select in the noisy model with probability 1−1/poly(n) with query complexity O(nlogn/γ2) and round complexity 4.

In the erasure model, the algorithms provided by this reduction do not have the optimal round/query complexity. We show that Θ(n/γ) queries are necessary and sufficient, as well as Θ(log∗(n)) rounds. For the algorithm, we begin by finding the median of a random set of size n/logn elements. This can be done in 4 rounds and O(n/δ) total comparisons by Corollary 1. Doing this twice in parallel, we find two elements that are guaranteed to be above/below the median, but very close. Then, we spend log∗(n) rounds comparing every element to both of these. It’s not obvious that this can be done in log∗(n) rounds. Essentially what happens is that after each round, a fraction of elements are successfully compared, and we don’t need to use any future comparisons on them. This lets us do even more comparisons involving the remaining elements in future rounds, so the fraction of successes actually increases with successive rounds. Analysis shows that the number of required rounds is therefore log∗(n) (instead of log(n) if the fraction was constant throughout all rounds). After this, we learn for sure that the median lies within a sublinear window, and we can again invoke the 4-round algorithm of Corollary 1 to finish up. Our lower bound essentially shows that it takes log∗(n) rounds just to have a non-erased comparison involving all n elements even with O(n/δ) per round, and that this implies a lower bound. Pseudocode for the algorithm and proofs of both theorems are in Appendix D.

Theorem 9.

Theorem 10.

Assume γ≤1/2. In the erasure model, any algorithm solving Select with probability 2/3 even with O(n/γ) comparisons per round necessarily has round complexity Ω(log∗(n)).

We now introduce a related problem that is strictly easier than Partition or Select, which we call Rank, and prove lower bounds on the round/query complexity of Rank noisy models, which will imply lower bounds on Partition and Select. In Rank, we are given as input a set S of n elements, and a special element b and asked to determine b’s rank in S (i.e. how many elements in S are less than b).7 We say that a solution is a t-approximation if the guess is within t of the element’s actual rank. We show formally that Rank is strictly easier than Select in Appendix A. From here, we prove lower bounds against Rank in the noisy model.

At a high level, we show (in the proof of Theorem 11) that with only O(nlogn/γ2) queries, it’s very likely that there are a constant fraction of ai’s such that the algorithm is can’t be very sure about the relation between ai and b. This might happen, for instance, if not many comparisons were done between ai and b and they were split close to 50-50. From here, we use an anti-concentration inequality (the Berry-Essen inequality) to show that the rank of b does not concentrate within some range of size Θ(n3/8) conditioned on the available information. In otherwords, the information available simply cannot narrow down the rank of b to within a small window with decent probability, no matter how that information is used. We then conclude that no algorithms with o(nlogn/γ2) comparisons can approximate the rank well with probability 2/3.

Theorem 11.

In the noisy model, any algorithm obtaining an (n3/8/40)-approximation for Rank with probability 2/3 necessarily has query complexity Ω(nlogn/γ2).

Finally, we conclude with an algorithm for findMin in the noisy model showing that findMin is strictly easier than Select. This is surprising, as most existing lower bounds against Select are obtained by bounding findMin. Our algorithm again begins by finding the minimum, x, of a random set of size n/logn using O(n/γ2) total comparisons by Corollary 1. Then, we iteratively compare each element to x a fixed number of times, throwing out elements that beat it too many times. Again, as we throw out elements, we get to compare the remaining elements to x more and more. We’re able to show that after only an appropriate number of iterations (so that only O(n/δ2) total comparisons have been made), it’s very likely that only n/logn elements remain, and that with constant probability the true minimum was not eliminated. From here, we can again invoke the algorithm of Corollary 1 to find the true minimum (assuming it wasn’t eliminated).

Theorem 12.

Assume n is large enough and 10≤c≤logn. Algorithm 10 has query complexity 3cnγ2 and solves findMin in the noisy model with probability at least 1−e−Ω(c).

Theorem 13.

Assume c≥1, n is large enough and γ≤1/4. Any algorithm in the noisy model with query complexity cnγ2 solves findMin with probability at most 1−e−O(c).

Theorem 12 shows that findMin is strictly easier than Select (as it can be solved with constant probability with asymptotically fewer comparisons). Theorem 13 is included for completeness, and shows that it is not possible to get a better success probability without a blow-up in the query complexity. The proof of Theorem 13 is similar to that of Theorem 11.

Appendix A Technical Lemmas

Before beginning our proofs, we provide a few technical lemmas that will be used throughout, related to geometric sums, biased random walks, etc.

We first show the following reductions to prove that k=n/2 is the most difficult choice of k in Select and Partition. We also show the reductions between Select and Partition.

Lemma 1.

The following relations hold:

Suppose A can solve Select / Partition in the case of n elements for any n, but only for k=n/2. Then A can be used to solve Select / Partition for any k,n with the same success probability.

Suppose A can solve Select on n elements. We can construct algorithm B based on A to solve Partition of n elements with one more round and extra n comparisons in the noiseless model, O(nlogn/γ) in the erasure model, and O(nlogn/γ2) in the noisy model. The success probability decreases by 1/poly(n) except in the noiseless model (where it doesn’t decrease).

Suppose A can solve Partition of n+1 elements . We can construct algorithm B based on A to solve Select of n elements with the same number of round and twice the number of comparisons. If the original success probability was p, the new success probability is at least p2.

Proof.

Let’s show the reductions one by one:

Wlog, let k<n/2. The algorithm to solve Select / Partition is the following:

Generate n−2k dummy elements which are smaller than all the n elements.

Run A on n elements together with n−2k dummy elements.

Output A’s output.

It’s easy to check the above algorithm works.

B is the following:

Run A, let x be the median output.

In the next round, compare every element to x (once in the noiseless model, O(logn/γ) times in the erasure model, and O(logn/γ2) times in the noisy model). For each element, if it beats x, accept it (in the noisy model, if at least half of the comparisons beat x). Otherwise reject it.

It’s easy to check B works.

B is the following:

Generate two dummy elements x1 and x2. x1 is smaller than all of the n elements and x2 is larger than all of the n elements.

Do the followings in parallel:

Run A on n elements and x1.

Run A on n elements and x2.

Output the element that is accepted in the first run and rejected in the second run.

It’s easy to see that B works.

∎

Lemma 2.

For a∈(0,1], ∑∞i=0e−aiik≤2k!/ak+1. For a∈[1,∞), ∑∞i=0e−aiik≤2k!/ak

Proof.

First, note that the function e−axxk has derivative zero exactly once in (0,∞), at x=k/a. So the function is increasing on (0,k/a) and decreasing on (k/a,∞). This means that ∑⌊k/a⌋−1i=0e−aiik≤∫⌊k/a⌋0e−axxk, and that ∑∞⌊k/a⌋+1e−aiik≤∫∞⌊k/a⌋e−axxk. Putting these together, we get:

∞∑i=0e−aiik≤∫∞0e−axxkdx+e−k(k/a)k

Note that (k/e)k≤k!, and ∫∞0e−axxkdx=k!/(ak+1), completing the proof.
∎

Lemma 3.

The distribution that is heads with probability 1/2+δ and tails with probability 1/2−δ has 1−Θ(δ2) bits of entropy.

Proof.

Proposition 2.

Consider a biased random walk that moves right with probability p≤1/2 and left with probability 1−p at every step. Then the probability that this random walk reaches k units right of the origin at any point in time is exactly (p/(1−p))k.

Proof.

First, note that the probability that this random walk reaches k units right of the origin at any point in time is exactly the probability that a random walk with the same bias reaches 1 unit right of the origin k times independently, because once the random walk reaches 1 unit right of the origin, the remaining random walk acts like a fresh random walk that now only needs to move k−1 units to the right at some point in time. So we just need to show that the probability that the random walk moves 1 unit to the right at some point in time is p/(1−p).

Note that whatever this probability, q, is, it satisfies the equality q=p+(1−p)q2. This is because the probability that the random walk moves right is equal to the probability that the random walk moves right on its first step, plus the probability that the random walk moves left on its first step, and then moves two units right at some point in time. This equation has two solutions, q=p/(1−p) and q=1. So now we just need to show that q≠1 when p<1/2.

Assume for contradiction that q=1. Then this means that the random walk not only reaches one unit right of the origin once during the course of the random walk, but that it reaches one unit right of the origin infinitely many times, as every time the walk reaches the origin is a fresh random walk that moves one unit right with probability one. So let Aj denote the random variable that is 1 if the walk moves right at time j, and −1 if the walk moves left at time j. We have just argued that if q=1, there are infinitely many t such that ∑j≤tAj/t>0. Therefore, liminft→∞∑j≤tAj/t≥0. However, we also know that E[∑j≤tAj/t]=1−2p<0, and the Ajs are independent. So the law of large numbers states that limt→∞∑j≤tAj/t=1−2p, a contradiction.

∎

We also include a technical lemma confirming that it is okay to do all of our sampling without replacement, if desired.

Lemma 4.

Let S be any set of n elements, all in [0,1]. Let X1,…,Xk be k samples from S without replacement. Then Pr[|∑iXi−(k/n)∑s∈Ss|≥δk]≤2e−δ2k/2.

Proof.

Define Yj=E[∑iXi|X1,…,Xj]. It’s clear that the Yj, j=1,…,n form a martingale (specifically, the Doob martingale for ∑iXi), and that Y0=E[∑iXi]=(k/n)∑jyj. So we just need to reason about how much the conditional expectation can possibly change upon learning a single Xi, then we can apply Azuma’s inequality.

Conditioned on X1,…,Xj, each of the remaining n−j elements of S are equally likely to be chosen, and each is chosen with probability exactly (k−j)/(n−j). So Yj=∑ji=1Xi+∑unsampled s∈Sk−jn−js. How much can |Yj−Yj−1| possibly be? We have (below, unsampled means elements that are still unsampled even after step j):

Yj−Yj−1=n−kn−jXj+∑unsampled s∈S(k−jn−j−k−j+1n−j+1)s

=n−kn−jXj−n−k(n−j)(n−j+1)∑unsampled s∈Ss

It is clear that the above quantity is at most 1, because all s∈[0,1], and n−kn−j∈[0,1] as well. It is also clear that the above quantity is at least −1, as ∑unsampled s∈Ss≤n−j, and n−kn−j+1≤1. So the Doob martingale has differences at most 1 at each step, and a direct application of Azuma’s inequality yields the desired bound.
∎

Appendix B Proofs for Non-Adaptive Algorithms

b.1 Upper Bounds

1: Select a skeleton set, S, of size √n (without replacement).

2: Compare each element of S to each other element of S.

3: Compare each element not in S to d−1 random elements of S.

Algorithm 1 Non-adaptive procedure for querying dn comparisons

Noiseless Model

1: Run Algorithm 1. Let S denote the skeleton set selected, and x denote the median of S.

2: Denote by AS the subset of S that beat x. Denote by RS=S−AS.

3: For all i∈S, accept i iff i∈AS. Otherwise, reject.

4: For all i∉S, if i beats an element in AS, accept. If an element in RS beats i, reject. Otherwise, make a random decision for i.

Algorithm 2 Non-adaptive algorithm for the noiseless model

Proof of Theorem 1:
We consider the error contributed by n/2+i (which is the same as n/2−i). There are two events that might cause n/2+i to be misplaced. First, maybe n/2+i loses to y for some y∈S,y<x. This is unlikely because this can only happen in the event that x is a poor approximation to the median. Second, maybe n/2+i is never compared to an element y∈[x,n/2+i]. This is also unlikely because the fraction of such elements should be about i/n. We bound the probabilty of the former event first.

Lemma 5.

Let Xi denote the fraction of elements in S greater than n/2+i. Then Xi≤1/2−i/n+ε with probability at least 1−2e−ε2√n/2.

Proof.

Xi is the average of √n independent random variables, each denoting whether n/2+i< some y∈S. E[Xi]=1/2−i/n. Applying Lemma 4 yields the lemma.
∎

We call a skeleton set S is good if for i=1,..,n/2, Xi≤1/2−i/n+ε. Now let’s fix the skeleton set S and assume S is good. From the above lemma, we know S is good with probability at least 1−n⋅e−ε2√n/2.

Lemma 6.

If S is good, then the probability that n/2+i is rejected is at most (1−i/n+ε)d−1 (and at most 1 if i/n≤ε).

Proof.

The elements that n/2+i are compared to are chosen uniformly at random from S. At least a 1/2+i/n−ε fraction of them are less than n/2+i, so an i/n−ε fraction of elements in S lie in [x,n/2+i]. So each time we choose a random element of S to compare to n/2+i, we have at least a max{i/n−ε,0} chance of comparing to an element in [x,n/2+i]. In the event that this happens, we are guaranteed to accept n/2+i. The probability that we miss on each of d−1 independent trials is exactly (1−i/n+ε)d−1.
∎

Therefore, conditioning that S is good, the expected c-weighted error contributed by elements n/2+1,...,n is at most :

n/2∑i=1min{1,(1−i/n+ε)}d−1ic≤εn∑i=1ic+n/2∑i=εn(1−i/n+ε)d−1ic

n/2∑i=εn(1−i/n+ε)d−1ic≤n/2∑i=εne(d−1)(−i/n+ε)ic

≤edεn/2∑i=1e−(d−1)i/nic

≤2edεc!(n/(d−1))c+1

The last inequality is a corollary of Lemma 2 proved in Appendix A. Taking ε=1/d shows that the expected c-weighted error contributed by elements n/2+1,...,n is (nε)c+1c+1+2nc+1edεc!(d−1)c+1=O((n/d)c+1) conditioned on S is good. Notice that when S is fixed, the c-weighted error contributed by each element is independent and bounded by nc. By the Hoeffding bound, the probability that the c-weighted error exceeds its expectation by more than (n/d)c+1 is at most e−2n3/d2c+2 conditioned on S is good. To sum up, taking elements that are smaller than the median into account, we know that with probability at least 1−2n⋅e−√n/(2d2)−2e−2n3/d2c+2, the c-weighted error is O((n/d)c+1).
□

Here we also show that our algorithm is better than a simpler solution. The simpler solution would be to just compare every element to a random d elements, and accept if it is larger than at least half, and reject otherwise. We show that, unlike the algorithm above, this doesn’t obtain asymptotically optimal error as d grows.

Theorem 14.

The simple solution has expected error Ω(nc+1/d(c+1)/2).

Proof.

Let Xij be an indicator variable for the event that n/2+i is smaller than the jth element it is compared to. Then n/2+i is accepted iff ∑jXij>d/2. As each Xij is a bernoulli random variable that is 1 with probability 1/2−i/n, the probability that n/2+i is mistakenly rejected is exactly the probability that a B(d,1/2−i/n) random variable exceeds its expectation by at least di/n. This happens with probability on the order of e−(i/n)2d. So for i=O(n/√d), this is at least 1/e, meaning that the error contribution from all n/2+i, i≤kn/√d for some absolute constant k is at least ic/e. Summing over all i means that the total error is at least ∑kn/√di=1ic/e=Ω(nc+1/d(c+1)/2).
∎

Erasure Model

2: Say that element a∈S is known to beat element b∈S if there exists some a=s0>s1>…>sℓ=b with all sj∈S, and sj beats sj+1 for all j∈{0,…,ℓ−1} (i.e. all of these comparisons were not erased).

3: Denote by AS the elements of S that are known to beat at least |S|/2 elements of S. Denote by RS the elements of S that are known to be beaten by at least |S|/2 elements of S (note that S may not equal AS∪RS).

4: For all i∈S, accept i if i∈AS. Reject i if i∈RS. Otherwise, make a random decision for i.

5: For all i∉S, if i beats an element in AS, accept. If an element in RS beats i, reject. Otherwise, make a random decision for i.

Algorithm 3 Non-adaptive algorithm for the erasure model

Proof of Theorem 2:
We again consider the error contributed by n/2+i (which is the same as n/2−i). There are again two events that might cause n/2+i to be misplaced. First, maybe it is beaten an element in RS. This is unlikely because this can only happen in the event that some element below the median makes it into AS. Second, maybe i is never compared to an element that beats it in AS. This is also unlikely because the fraction of such elements should be about i/n. We bound the probabilty of the former event first, making use of Lemma 5.

Again let Xi denote the fraction of elements in S that are smaller than n/2+i, and define S to be good if Xi≤1/2−i/n+ε for all i∈[1,n/2]. Then Lemma 5 guarantees that S is good with probability at least 1−ne−ε2√n/2. This time, in addition to S being good, we also need to make sure that AS is large (which will happen as long as not too many comparisons are erased).

Lemma 7.

Let x,y∈S such that x>y and |S∩(x,y)|=k. Then with probability at least 1−e−kγ2, x is known to beat y.

Proof.

x is known to beat y if there is some z∈S∩(x,y) such that the comparisons between x and z and y and z are both not erased. There are k such possible z, and all comparisons are erased independently. So the probability that for all z, at least one of the comparisons to {x,y} were erased is (1−γ2)k≤e−kγ2.
∎

Corollary 2.

For all ε∈(0,1/2), with probability at least 1−n2e−εγ2√n, both AS and RS have at least (1/2−ε)|S| elements.

Proof.

By Lemma 7 and a union bound, with probability at least 1−n2e−εγ2√n, it is the case that for all x,y∈S that have at least ε|S| elements between them, it is known whether x beats y or vice versa. In the event that this happens, any element that is at least ε|S| elements away from the median will be in AS or RS.
∎

We’ll call a skeleton set S really good if it is good, and |AS|≥(1/2−ε)|S|. Now let’s fix the skeleton set S and assume S is really good. From the above arguments, we know S is really good with probability at least 1−n⋅e−ε2√n/2−n2e−εγ2√n.

Next, observe that if Xi≤1/2−i/n+ε, and |AS|≥(1/2−ε)|S|, then there are at least (i/n−2ε)|S| elements in AS less than n/2+i. Therefore the probability that n/2+i never beats an element in AS is at most (1−(i/n−2ε)γ)d−1 (and at most 1 if i/n≤2ε). So the total c-weighted error that comes from these cases is at most min{1,(1−(i/n−2ε)γ)d−1}ic.

Taking ε=1/(γd) shows that the expected c-weighted error contributed by elements n/2+1,...,n is nc+1((2ε)c+1c+1+e2dεγc!(γ(d−1))c+1)=O((n/(γd))c+1) conditioned on S is really good. Notice that when S is fixed, the c-weighted error contributed by each element is independent and bounded by nc. By the Hoeffding bound, the probability that the c-weighted error exceeds its expectation by more than (n/(γd))c+1 is at most e−2n3/(γd)2c+2, conditioned on S is really good. To sum up, taking elements that are smaller than the median into account, we know that with probability at least 1−2n⋅e−√n/(2γ2d2)−2n2e−γ√n/d−2e−2n3/(γd)2c+2, the c-weighted error is O((n/(γd))c+1).

□

Noisy Model

2: Run the Braverman-Mossel algorithm to recover the maximum likelihood ordering of elements in S in time poly(n,1/γ)[BM08]. Let x denote the median under this ordering, and denote by y<Sz that y comes before z in this ordering.

3: For all i∈S, accept i iff i>Sx, otherwise reject.

4: For all i∉S, let bi1<S…<Sbic denote the elements >Sx that i is compared to (note that maybe