Analysis of Multiplicative Weights Algorithm

Claim

Assume that all losses are in [0,1][0,1], the regret of the Multiplicative Weight Algorithm run for TT steps with a parameter 0<ϵ1/20 < \epsilon \le 1/2 is:

RTϵT+lnnϵR_T \le \epsilon T + \frac{\ln n}{\epsilon}

In particular, if T>4lnnT > 4 \ln n and we choose ϵ=lnnT\epsilon = \sqrt{\frac{\ln n}{T}}, we have

RT2TlnnR_T \le 2\sqrt{T \ln n}

Outline

Define

Wt=i=1nwi(t)W_t = \sum_{i=1}^n w_i^{(t)}

to be the sum of the weights at time tt, at the end of the execution of the algorithm:

Therefore

Call the offline optimum LL^*:

L=minxΔt=1Ti=1nxili(t)=mini=1,,nt=1Tli(t)L^* = \min_{x \in \Delta} \sum_{t=1}^T \sum_{i=1}^n x_i \cdot l_i^{(t)} = \min_{i=1, \dots, n} \sum_{t=1}^T l_i^{(t)}

and we can use LtL_t to denote the loss incurred by the algorithm at time tt

Lt=i=1nxi(t)li(t)L_t = \sum_{i=1}^n x_i^{(t)}l_i^{(t)}

Lemmas

Lemma 2

If WT+1W_{T+1} is small, then the offline optimum is large
WT+1(1ϵ)LW_{T+1} \ge (1-\epsilon)^{L^*}

Proof

Let jj be any strategy, then

WT+1=i=1nwi(T+1)wj(T+1)=t=1T(1ϵ)lj(t)=(1ϵ)t=1Tlj(t)W_{T+1} = \sum_{i=1}^n w_{i}^{(T+1)} \ge w_j^{(T+1)} = \prod_{t=1}^T (1 - \epsilon)^{l_j^{(t)}} = (1-\epsilon)^{\sum_{t=1}^T l_j^{(t)}}

Now let ii^* be an offline optimal strateggy,

L=t=1Tli(t)L^* = \sum_{t=1}^T l_{i^*}^{(t)}

And observe that for ii^*,

WT+1(1ϵ)t=1Tli(t)=(1ϵ)LW_{T+1} \ge (1-\epsilon)^{\sum_{t=1}^T l_{i^*}^{(t)}} = (1-\epsilon)^{L^*}

Lemma 3

If the loss suffered by the algorithm is large, then WT+1W_{T+1} is small
WT+1nt=1T(1ϵLt)W_{T+1} \le n \cdot \prod_{t=1}^T (1-\epsilon L_t)

Proof

We know W1=nW_1 = n, so we can prove

WT+1Wt(1ϵLt)W_{T+1} \le W_t \cdot (1-\epsilon L_t)

Observe that

Wt+1=i=1nwi(t+1)=i=1nwi(t)(1ϵ)li(t)i=1nwi(t)(1ϵli(t))z,ϵ[0,1],(1ϵ)z1ϵz=Wti=1nxi(t)(1ϵli(t))xi(t)=wi(t)/Wt=Wt(i=1nxi(t)i=1nϵxi(t)li(t))=Wt(1ϵLt)\begin{split} W_{t+1} &= \sum_{i=1}^n w_i^{(t+1)} \\ &= \sum_{i=1}^n w_i^{(t)} \cdot (1-\epsilon)^{l_i^{(t)}} \\ &\le \sum_{i=1}^n w_{i}^{(t)} \cdot (1-\epsilon \cdot l_i^{(t)}) \qquad \because \forall z, \epsilon \in [0,1], (1-\epsilon)^z \le 1- \epsilon z \\ &=W_t \sum_{i=1}^n x_i^{(t)} \cdot (1-\epsilon \cdot l_i^{(t)}) \qquad \because x_i^{(t)} = w_i^{(t)} / W_t \\ &= W_t \cdot (\sum_{i=1}^n x_i^{(t)} - \sum_{i=1}^n \epsilon \cdot x_i^{(t)} \cdot l_i^{(t)}) \\ &= W_t \cdot (1-\epsilon L_t) \end{split}

Proof

LEMMA 2: If WT+1W_{T+1} is small, then the offline optimum is large
WT+1(1ϵ)LW_{T+1} \ge (1-\epsilon)^{L^*}
LEMMA 3: If the loss suffered by the algorithm is large, then WT+1W_{T+1} is small
WT+1nt=1T(1ϵLt)W_{T+1} \le n \cdot \prod_{t=1}^T (1-\epsilon L_t)

Putting them together

(1ϵ)Lnt=1T(1ϵLt)Lln(1ϵ)lnn+t+1Tln(1ϵLt)\begin{split} (1-\epsilon)^{L^*} &\le n \cdot \prod_{t=1}^T (1-\epsilon L_t) \\ L^*\ln(1-\epsilon) &\le \ln n + \sum_{t+1}^T \ln(1-\epsilon L_t) \end{split}

Note that because of tarylor series ln(1z)=n=1znn\ln (1-z) = \sum_{n=1}^\infin - \frac{z^n}{n}, we have inequality

0z1/2,zz2ln(1z)z\forall 0 \le z \le 1/2, -z-z^2 \le \ln (1-z) \le -z

Then apply this inequality

L(ϵϵ2)lnnϵi=1TLtL^* \cdot (-\epsilon - \epsilon^2) \le \ln n - \epsilon \cdot \sum_{i=1}^T L_t

Divide by ϵ\epsilon and rearrange a bit,

i=1TLtLϵL+lnnϵϵT+lnnϵ\sum_{i=1}^T L_t - L^* \le \epsilon L^* + \frac{\ln n}{\epsilon} \le \epsilon T + \frac{\ln n}{\epsilon}