@chien_eli (but in the clipped version, there's already counter-examples to the weaker statement that <x-y, clip(grad(f(x)) - clip(grad(f(y))> is positive, without which there is no hope for 1-Lipschitz)
@chien_eli (L-smoothness buys you something stronger, that for all \eta <= 2/L,
<x-y, grad(f(x)) - grad(f(y))> >= \eta/2 * ||grad(f(x)) - grad(f(y))||^2
which proves the 1-Lipschitz-ness of \phi)
@chien_eli set x=(1,0) and y=(0,0), the crucial thing is that while <x-y, grad f(x) - grad f(y)> is positive, when you replace grad by clip(grad) this inner product becomes negative
@KamerynJW looking at law #10 in crowe's article, the contrast with empirical science is that (hopefully!) we don't tear down our forebears' work by showing its wrong. but while full u-eys may be rare, the story definitely can and does take right turns generation to generation
@jeffreycider If you’d prefer a characterization just with the numbers 0 and 1:
the polynomial
f(x) = (3x - x^3)/2
is the odd polynomial of the lowest degree which satisfies
f(1) = 1 and f’(1) = 0.
these conditions uniquely determine the numbers 3 and 1/2! @jxbz
@henrismitch Maybe the abstraction isnt useful, but here's a slightly more general result: if F : V -> \R is a convex function on a vector space V, and W \subset V is a subspace, then
G(v) := min_{w \in W} F(v+w)
defines a convex function
G: V/W -> \R
on the quotient vector space V/W
I needed the following result, which is quite easy to prove algebraically. Define d(x) = \min_y{h(x-y)+g(y)} where h(.) and g(.) are convex functions. Then, d(.) is convex.
However, this is surprising to me, as the minimum of convex functions is not typically convex. Intuition?