Proof of Seperating Hyperplane Theorem (4)

If there exists two disjoint convex sets C,DC,D then there exists a hyperplane ax=b\vec{a}^\top \vec{x} = b that seperates the two sets.

Define the two points cC\vec{c} \in C and dD\vec{d} \in D that are closest to each other, and let it be the distance between the two sets:

dist(C,D)=inf{cd2cC,dD}dist(C,D) = \inf \{||\vec{c} - \vec{d}||_2 | \vec{c}\in C, \vec{d} \in D \}

The inf\inf function denotes the infimum ⇒ the largest lower bound of cd2||\vec{c} - \vec{d}||_2

Then there exists a seperating function:

f(x)=(dc)(xd+c2)=0f(x)=(\vec{d} - \vec{c})^\top (\vec{x} - \frac{\vec{d}+\vec{c}}{2}) = 0

And we will see that

cC,f(c)0dD,f(d)0\forall \vec{c} \in C, f(\vec{c}) \le 0 \\ \forall \vec{d} \in D, f(\vec{d}) \le 0

We will prove those constraint by contradiction:

Assue uD\exists \vec{u} \in D such that f(u)<0f(\vec{u}) < 0

f(u)=(dc)(ud+c2)=(dc)((ud)+12(dc))=<dc,ud>+12dc22undefinedwe know that this is 0\begin{split} f(\vec{u}) &= (\vec{d}-\vec{c})^\top (u-\frac{\vec{d}+\vec{c}}{2}) \\ &= (\vec{d}-\vec{c})^\top ((\vec{u} - \vec{d}) + \frac{1}{2} (\vec{d} - \vec{c})) \\ &=<\vec{d}-\vec{c}, \vec{u}-\vec{d}> + \underbrace{\frac{1}{2} ||\vec{d} - \vec{c}||_2^2}_{\text{we know that this is $\ge 0$}} \\ \end{split}

So implies that for f(u)<0f(\vec{u}) < 0, <dc,ud><\vec{d} - \vec{c}, \vec{u} - \vec{d}> has to be lower than 0

Because this inner product is negative, so if I move along the vector ud\vec{u} - \vec{d} for some distance tt, then I may get closer to c\vec{c}

Let

p=d+t(ud)=tu+(1t)d\vec{p} = \vec{d} + t(\vec{u} - \vec{d}) = t\vec{u}+(1-t)\vec{d}

If t(0,1)t \in (0,1) then pD\vec{p} \in D (Since DD is convex)

Consider

cp22=cdt(ud)2=((dd)t(ud))((cd)t(ud))=cd2+t2ud22t<cd,ud>\begin{split} ||\vec{c} - \vec{p}||_2^2 &= ||\vec{c} - \vec{d} - t(\vec{u}-\vec{d})||^2 \\ &= ((\vec{d} - \vec{d}) - t(\vec{u}-\vec{d}))^\top((\vec{c} - \vec{d})-t(\vec{u}-\vec{d})) \\ &= ||\vec{c} - \vec{d}||^2 + t^2 ||\vec{u}-\vec{d}||^2 - 2t<\vec{c}-\vec{d},\vec{u}-\vec{d}> \end{split}

We want to show that p\vec{p} is closer than d\vec{d} to c\vec{c}

So let’s write the next two terms

t2ud22t<cd,ud>=t2ud2+2t<dc,ud>undefinedstrictly <0\begin{split} & t^2 ||\vec{u} - \vec{d}||^2 - 2t<\vec{c}-\vec{d}, \vec{u}-\vec{d}> \\ =& t^2 ||\vec{u} - \vec{d}||^2 + 2t\underbrace{<\vec{d}-\vec{c}, \vec{u} - \vec{d}>}_{\text{strictly }<0} \\ \end{split}

But we can always choose a value of tt such that the first term is always smaller in magnitude

t<2<dc,ud>ud2t < -\frac{2<\vec{d}-\vec{c},\vec{u}-\vec{d}>}{||\vec{u}-\vec{d}||^2}

So we have proved a contradiction (because now the distance between c\vec{c} and d\vec{d} is no longer the minimal)