I am no expert, so I cannot cite references to the absolute fastest method, if such exists. You could use parallel processing, especially shared-memory such as in GPGPU computing. However, I assume you seek a serial algorithm. I also assume you are unable to amortize the initial cost of constructing an inverse lookup function over the need for multiple queries (as in thermodynamic property inverse lookups). I also assume your function is C0 continuous (at the nodes or cell-interfaces) and that its values at the nodes are available, so that it is cheap to evaluate it as a function of the node index i.

The hard part (if you have many cells in your domain) is to first locate the cell that contains the root. Given that the function is piecewise linear, the second phase of finding the precise root within that cell is trivial. In the second phase, you use f as a function of the spatial coordinate x within the cell. For the first phase, you should work directly with the function f as a discrete gridfunction of the node index i. This way, you avoid the cost of finding the interval that contains a given x (this latter is almost as expensive as the original problem and similar to it). For the first phase, you have a nonlinear function of the node indices (even if f were globally linear in the spatial coordinate x, if you had nonuniform mesh spacing, f would be a nonlinear function of i). I suggest a bracketing nonlinear rootfinding method that does not use derivative information, such as Bisection, False Position or Ridder's Method (see the Numerical Recipes book by Press et al). The latter two methods will be faster than bisection on most well-behaved functions. The bracketing property is convenient since you are trying to home in on the cell that contains the root. You retain the real number arithmetic of the algorithm, even though the independent variable i takes only integer values. You will have to modify the root finder slightly in order to round the estimated real number root up or down to the nearest integer node number such that the root remains bracketed. You must also modify the convergence test. Assuming that the root is initially bracketed by the endpoints of your domain, the test for convergence of the first phase will be when the latest two (rounded) roots differ by unity, i.e., they are neighboring node indices which are endpoints of the cell containing the exact root. Then you execute the second phase to find the exact root.

An alternative method for the first phase would be to work with the square of the given function and minimize that using a derivative-free downhill type optimizer. Since your function is monotonic, there will be no minima other than the desired root. Unsure whether this would be any faster or slower than the approach of my previous post.

This is way more complicated than necessary. You have a sorted (due to monotonicity) array of function values <code>f</code> and a corresponding sorted array of node coordinates <code>x</code>. A single function evaluation <code>f(s)</code> involves searching <code>x</code> for <code>s</code> to find the index (binary search is optimal in the general case), then doing piecewise linear interpolation. This search is just as expensive as finding the region where the root occurs using binary search on </code>f</code>. Assuming nothing lucky is in cache and no page faults are triggered, this will take about 2 microseconds on modern hardware for an array of 1 billion double precision values (8 GB).

Yes, although I would definitely not suggest False Position since it has `O(n)` worst case runtime for a function/sampling that is not so pathological. Ridder's method has an easily achivable worst case that is double bsearch. Worst case for Brent's method is basically the same as bsearch. Also note that once the relevant part of the array is in cache (guaranteed for the last thousand values or so), the overhead of casting and division is significant compared to function evaluation (array index). Basically, since function evaluation becomes so cheap once you enter the neighborhood where higher-order convergence applies, I would advise sticking with bsearch.

If the function is sufficiently well-behaved at the global scale and really huge, you might be able to beat bsearch by using Brent's method early (when function evaluation is expensive due to memory latency) and switching to plain bsearch once the interval size is narrowed down. I think you'd be very hard-pressed to find a case where this was worth the bother.

Yes, I agree that binary search (bisection) is well-behaved, and there is likely not much to be gained from Ridder's or Brent's, since here we are not concerned with asymptotic convergence rate (which would matter if you were looking for high accuracy in a real root). The original poster did not specify how large his dataset is, and he asked for a really fast method.

valuable commnets, thanks Ananda and also Jed (i missed to track this thread)

i have forgotten to notify that my domain is one dimensional, so root of an x-y function is desired.

> "I suggest a bracketing nonlinear rootfinding method that does not use derivative information, such as Bisection, False Position or Ridder's Method "

thanks, i currently use bisection method it works rubost but when we do not have uniform distribution of break-points its convergence is slow,

a simple solution can be median based bisection, i.e., break-points set bisection instead of interval bisection. However, finding set medians is expensive and function of number of breakepoints (note that breakepoints are known for me), am i correct?

is there method to find median vry fast, or way to do bi-section based on breakpoints distribution?

it seems that i should try "False Position" and "Ridder" methods you mentioned as well.

I do not understand your use of the terms bisection, set, interval, and median. Therefore, I am unsure exactly what you are currently doing. Perhaps you can explain exactly what you do in terms of x(i), y(i), i = 1,...,n, where i is the node index. I particularly fail to follow what you mean by "interval bisection" versus "break-points set bisection".

As Jed and I both pointed out, during the first phase (location of the root-containing cell), you should work with y as a function of i, and not involve x at all. There should be no "interval bisection" involved, if "interval" refers to an interval of the x domain. You should be doing bisection on the i bracket, which (since i is an integer) amounts to binary search as Jed referred to it. In bisection (binary search), what you do coincides precisely with finding the median observation (because your y data are monotonic, i.e., ordered). If the current bracket has iL=J as the left end and iR=J+2K as the right end, then the bisection point is exactly the median point iM=J+K+1. This is trivial to calculate as iM=iL+(iR-iL+2)/2. Then y(iM) is the median observation which will replace either the left end or the right end of the current bracket. Note that if rather iR=J+2K-1, then as I previously said, you will have to round iM to either J+K or J+K+1, instead of taking the average observation of y(J+K) and y(J+K+1). It so happens that x(iM) is also the median x-observation (since x is ordered with respect to i), but as I said this should play no part in your algorithm.

The bisection method Jed and I described may indeed not be the fastest method. Since you say your function y is monotonic, I suspect that False Position or Ridder's Method or Brent's Method may be significantly faster in terms of number of iterations required. However, as Jed pointed out, they involve quite a bit more arithmetic per iteration than binary search does. You may have to try the various methods in practice, to see which is fastest for your class of functions.

currently i used Brant method in numerical recipe codes, it works better than bisectin, is satisfactory so no need for further refinement, thanks

PS: regarding to set bisection: for picewise linear graph with "n" segment we have 2n breake points, root is located within two consecutive breakpoints, set bisection means to use median of breakpoint set as boundary of new interval,

so its worst case convergence is explicitely known as a function of n, by set bisection we finally reach to a segment in which zero occured so just do an interpolation, main drawback is to find set median which is expensive

I still do not see what is expensive about "finding the median". If you have an interval/bracket of 2K+1 breakpoints, the binary search method simply picks the (2K+2)/2 th breakpoint as the location of the median y value, which is VERY cheap to compute (one multiplication, one addition, one division, one memory access to get y(iM)).

If instead, you are brute-force scanning x(i) to bisect the interval [x(iL),x(iR)], then you are doing it WRONG. This bisection is as hard as the original problem of finding the root, and you may be solving it at each iteration. If you have 2n breakpoints, and if B is the computational cost of brute-force left-to-right scanning the {x(i)} set to find the midpoint of the x-interval, then on average B is of the order of n arithmetic operations. If it takes T iterations to find the y root, then your total cost by the method of bisecting the x-interval will be of the order of B*T=n*T. Just brute-force scanning of the {y(i)} set should be cheaper, costing just of order B=n, since your y function is monotonic in x and therefore in i. Bisection correctly implemented, as described in both my previous posts and in Jed's post, on y as a function of i should be much cheaper and cost only on the order of log2(2n). For a billion breakpoints, bisection should in the worst case require only about 30 iterations, each iteration costing only an integer multiplication, an integer addition, an integer division, a floating-point comparison, and a couple of memory/cache accesses. As Jed points out, this entire bisection procedure should take on the order of microseconds on a modern CPU to find the root.

Sorry to get so pedantic, but if you complain about the cost of finding the set median, it sounds like your algorithm is not right. It sounds like you are treating y as a function of x during the first phase, instead of y as a function of i. Good luck to you.

first i should say that in my problem cost of one function evaluation is high, actually my problem is dual of a large scale minimazation with large number of parameters. In dual setting we have one objective functional to be minimized with one unknown parameter x (actually lagrange multiplier of sub-problem) and my actual parameters are now known within the solution of this sub-problem which is matter of this thread.

I have 2n interval in which function is picewise linear, so 2n-1 breakpoints. Breakpoints are a-priori known, *however* they are not sorted in an ascending order.

in interval bisection method i used, i bisect x-coordinate to shrink interval and it work well (i can not understand what you say about my wrong)

however, if we have non-uniform distribution of breakpoints, and concenteration of breakpoint near a specific point, then interval bisection methods may takes lot of function evaluation to explor concentration region.

in contrast set bisection method, always remove half of breakpoints so its upper cost is a-priori known as a function of n. But to find bisection point of set we need to find median (at each cycle) and remove pointes that are either smaller or larger than median. This need one sort procedurec O(2n log 2n) not simply "one multiplication, one addition, one division" you said (becasue *breakponts positions* are not available in an ascending order)

exactly the reverse scenario is present in contrast to your understanding, i.e. to find interval bisection no scanning is needed it is juast avaraging between current set endpoints (you may forget that x- is continuous)

Well, in that case, my apologies to you for insisting that your algorithm is wrong. It appears the assumption I stated in my first post is not satisfied, viz. : "I also assume your function is C0 continuous (at the nodes or cell-interfaces) and that its values at the nodes are available, so that it is cheap to evaluate it as a function of the node index i".

From your latest post, it appears that you do not know y (the objective functional) directly as a function of i, because the y values are not precomputed to be stored in an array. Instead, it appears that you know how to obtain y directly as a function of x (the lagrange multiplier), but this evaluation is expensive. It also appears that you know the breakpoints in x, but that these are not sorted, but remain unsorted in an array with index, say j. Finally, you know a priori that y is globally monotonic with respect to x, in addition to being piecewise linear over pieces defined by the breakpoints in x. Am I right in assuming this to be the situation?

Because you can evaluate y directly as a function of x, you can apply rootfinding algorithms in rational arithmetic such as interval bisection, without the need to examine the breakpoint index j. And the locations of the breakpoints do not change during these iterations. However, the clustering of the breakpoints somehow causes the interval bisection method to converge slowly. You believe that set bisection of the breakpoint set would lead to fewer iterations, but the drawback is that "at each cycle" we need to find the median of the ordered breakpoint set, which requires sorting at each cycle, costing O(2n log(2n)) at each cycle. By "each cycle", you mean "each iteration of the rootfinder". Do I understand you correctly?

Please correct me if the preceding paragraphs show anything wrong about my current understanding of the situation.

If the root is in the region of dense clustering of breakpoints, you are correct in saying that breakpoint set bisection will need fewer iterations than x-interval bisection. If the root is in the region of sparse clustering, on the other hand, the interval bisection method will take fewer iterations than set bisection (assuming that you can magically switch to linear interpolation once the interval is included in the correct cell; this would be true if you were using false position instead of bisection).

Assuming that the root is in the region of dense clustering of breakpoints, then set bisection would be faster than interval bisection, and have the additional advantage that once you got the bracket down to a single segment, you could then use linear interpolation to exactly find the root. Bisection (set or interval) works only if the dependent function y is at least C0 continous with respect to the ordered independent variable (i or x). This is true of any bracketing rootfinding algorithm. Hence the need to find the median of the breakpoint set during the set bisection. However, this price of finding the median needs to be paid only once, not during each iteration.

You say the 2n+1 breakpoints xb(j) are already available (2n+1 includes the endpoints of the domain). Simply sort the xb(j) once to obtain xb(i), i = 1,...,2n+1. This will cost O(2n log(2n)), as you say. The set {xb(i)} will be in ascending order with respect to i. After that, finding the median xb(iM) of a bracket [xb(iL), xb(iR)] is trivial to calculate via iM=iL+(iR-iL+2)/2, as I pointed out in an earlier post. Then you can calculate the objective functional y(xb(iM)) for use in any of the modified rootfinding techniques (bisection, False Position, Ridder's, Brent's, ...), with the modifications I described in an earlier post. The only drawback would be the one time cost of sorting the set {xb(j)}. Since these xb(j) are presumably available as an array in memory, the sorting of even a billion values should take on the order of a minute, and may still be cheap when you consider how many evaluations of y this may save. I would be amazed if you were dealing with a billion values. More likely a few hundred breakpoints, I would guess. Sorting which would be a lot cheaper.