For any x, consider the function b(y) = Bf(x, y). It is known that this is not always convex (choose f (x) = x3 for S ⊂ ℝ) and I can show that for S ⊂ ℝ it is always quasi-convex (i.e., b(λy+(1-λ)y') ≤ max{ b(y), b(y') } for λ∈[0,1], y, y' ∈ S) but cannot prove or find a counter-example in the general case.

I've done a quick hunt around the literature on Bregman divergences but cannot find an answer either way.

2 Answers
2

The answer is: yes, it is always quasi-convex! I'll show this by first proving a stronger characterization, from which the other facts follow. Please bear with me as I first make a few definitions.

Let convex $S \subseteq \mathbb{R}$ and a function $f:S\to \mathbb{R}$ be given. To avoid existence of derivatives, let $f'(v)$ refer to any subgradient of $f$ at $v$, and say $f$ is convex if for any $x,v \in S$, $f(x) \geq f(v) + f'(v)(x-v)$. (This is an equivalent formulation of convexity, and when $f$ is differentiable, gives the 'first-order' definition of convexity.) Note critically that for $u,v\in S$ with $u\leq v$, it follow that
$f'(u) \leq f'(v)$. (This is sort of like the mean value theorem, though not exactly since those subgradients are technically sets; I think all of I've said so far may appear in the thesis
of Shai Shalev-Shwartz.) Define
$$b_x(v) = f(x) - f(v) - f'(v)(x-v)$$
to be the Bregman divergence of $f$ at the point $x$, taking the linear approximation at $v$. By the definition of convexity, if follows that $b_x(v) \geq 0$ for all $x,v\in S$.

Fact: $b_x(\cdot)$ is decreasing up to $x$, exactly zero at $x$, and increasing after $x$.

To see that this means $b_x(\cdot)$ is quasi-convex, take any $y\leq z$ and any
$\lambda \in [0,1]$. Then the point $w:=\lambda y + (1-\lambda)z$ lies on the line
segment $yz$, and $b_x(\cdot)$ must be increasing in the direction of at least one of
these endpoints.

This also gives a strong idea of how convexity breaks down for $b_x(\cdot)$. In
particular, let $f= \max\{0, |x|-1\}$ (a 1-insensitive loss for regression). Then the
function $b_0(\cdot)$ is 0 on $(-1,1)$ and 1 everywhere else except $\{-1,+1\}$ (those points
are different since, by using subgradients, these functions have sets as output; but if you
took a differentiable analog to this loss, something like a Huber loss, you'd get basically the same effect, and $b_0(\cdot)$ is a vanilla continuous (non-convex) function).

Thanks for the proof—it's very well written—however, as I stated in my question, I already have a proof (indeed, it is quite similar to your own) of quasi-convexity for the case when S ⊂ ℝ. My real problem is showing it for multi-dimensional functions.
–
Mark ReidMar 17 '10 at 23:09

(and set $x= (0,0)$.) in the bad example, since $f$ is linear along $x-y$ and $x-z$, then $b_x(y) = b_x(z) = 0$. On the other hand, since it is quadratic along $x-w$, the Bregman divergence is nonzero; in fact, it is $1/2$. I have an argument that $f$ is convex, but it is vague. I have to run, but tomorrow hopefully I can come back with something better.
–
Matus TelgarskyMar 18 '10 at 9:42

$bb(y) = \phi^*(y^*) - \langle x, y^*\rangle$. $bb(y)$ is merely a translation away from $b(y)$, and it seems a more direct way of dealing with the general case, especially since we know $\phi^*$ is convex as well (here $y^*$ is the dual $y^* = \nabla f(y)$)