Convergence Analysis for Block Coordinate Decent Algorithm and Powell's Examples
We mainly focus on the convergence of Block coordinate decent with exact minimization, whose block update strategy employs Gauss-Seidel manner. And then use Powell's example to see what will happen if some conditions are not met.
Reference: 1. Dimitri .P Bertsekas, Nonlinear Programming 2ed 2. Powell ,1973, ON SEARCH DIRECTIONS FOR MINIMIZATION ALGORITHMS
Problem description
Notations
We want to solve the problem:
where X is a Cartesian product of closed convex sets
We assume that
We denote
Assumption
We shall assume that for every
has at least one solution.
Algorithm
The Gauss-Seidel method, generates the next iterate
Convergence Analysis
Theorem
Suppose that
viewed as a function of
PROOF
Let
By the nature of this algorithm, for all
Since
Now we want prove that
From (*), we know that
Let
which implies that
At this stage, if we can prove that
Let
and
(Note: Although
Furthermore, if we prove that for
And thus
By far, it remains to prove that
Assume the contrary that
Since
Again, using (*), we conclude
Let
Similarly, let
Powell's example
In ON SEARCH DIRECTIONS FOR MINIMIZATION ALGORITHMS, Power actually gives three examples that sequences generated by the algorithm discussed above do not convergence to stationary points once some hypothesis are not met.
The first example is straightforward, However, the remarkable properties of this example can be destroyed by making a small perturbation to the starting vector
.The second example is not sensitive to either small changes in the initial data or to small errors introduced during the iterative process, for example computer rounding errors.
The third example suggests that a function that is infinitely differentiable that also causes an endless loop in the iterative minimization method.
We here only presents the first example. Consider the following function
where
Given the starting point
We here present the first six steps of this case
cycle/totall iteration | x | y | z |
---|---|---|---|
1/1 | 1+ |
1+ |
-1- |
1/2 | 1+ |
-1- |
-1- |
1/3 | 1+ |
-1- |
1+ |
2/4 | -1- |
-1- |
1+ |
2/5 | -1- |
1+ |
1+ |
2/6 | -1- |
1+ |
-1- |
3/7 | 1+ |
1+ |
-1- |
... | ... | ... | ... |
This result implies that the sequence obtained by this algorithm can not converge to one single point since
But

Remark
A hint to derive the update formula:
Indeed, derivates of
and are as follows respecivelySo for the univariate optimization problem, setting the derivate of
to zero, we concludeThe gradient of
on this cyclic path, is andThis example is unstable with respect to small perturbations. Small changes in the starting point
or smal errors in the numbers that are computed during the calculation will destroy the cyclic behavior.It's s clear the choice of perturbations
plays a key role. Say, and we havecycle/totall iteration x y z 1/1 1+ 1+ -1- 1/2 1+ -1- -1- 1/3 1+ -1- 1+ 2/4 -1- -1- 1+ 2/5 -1- 1+ 1+ 2/6 -1- 1+ -1- ... ... ... ... To preserve the cyclic behavior , we have to make sure that
And in practice, when we do some numerical tests, we shall find that, this theoretically-existed endless loop actual breaks down due to the rounding errors. A brief illustration is given below. In this experiment, loop ends at the 52 steps.
As
suggests that, when
, the choice of is arbitrary and we set in the case above. So the uniqueness requirement is violated. It turns out that the six vertices are even not the stationary points.For example, at point
, and for any ponit in the unit cubic . Say, ,Actually, as in the proof of
Theorem
, we prove that converges to , where is the limit point of . But in this example, the limit point of is (1,1,-1) while the limit point of is either (-1,1,-1) or (1,-1,1). So the requirement of uniqueness is not met.
R codes for numerical experiments
####################
### Function for test ###
####################
PowellE1<-function(xstart,cycles,fig=T){
#######function part ##############
UpdateCycle<-function(x){
Sign<-function(x){
if (x>0){
return(1)
}else{
if (x<0){
return(-1)
}else{
return(0)
}
}
}
x.new<-c()
x.new[1]<-Sign(x[2]+x[3])*(1+0.5*abs(x[2]+x[3]))
x.new[2]<-Sign(x.new[1]+x[3])*(1+0.5*abs(x.new[1]+x[3]))
x.new[3]<-Sign(x.new[1]+x.new[2])*(1+0.5*abs(x.new[1]+x.new[2]))
cycle<-matrix(c(x.new[1],x[2],x[3],x.new[1],x.new[2],x[3],x.new[1],x.new[2],x.new[3]),
ncol=3,byrow=T)
return(cycle)
}
fpowell<-function(x){
PostivePart<-function(x){
ifelse(x>=0,x,0)
}
fval<-(-(x[1]*x[2]+x[2]*x[3]+x[1]*x[3]))+
PostivePart(x[1]-1)^2+PostivePart(-x[1]-1)^2+
PostivePart(x[2]-1)^2+PostivePart(-x[2]-1)^2+
PostivePart(x[3]-1)^2+PostivePart(-x[3]-1)^2
return(fval)
}
############ operation part ################
x.store<-matrix(ncol=3,nrow=cycles*3+1)
x.store[1,]<-xstart
for (i in seq_len(cycles)){
x.store[(3*i-1):(3*i+1),]<-UpdateCycle(x.store[3*i-2,])
}
x.store<-x.store[-1,]
fval<-rep(0,cycles*3)
for(i in seq_len(cycles*3)){
fval[i]<-fpowell(x.store[i,])
}
fval<-as.matrix(fval)
if (fig==T){
plot(fval,ylim=c(min(fval)-1,max(fval)+1),type="l",xlab="Iterations",ylab = "F value")
}
r<-list()
r$x.iterate<-x.store
r$fval<-fval
return(r)
}
##################
#### Test 1 ########
##################
perturb<-0.5
xstart<-c(-1-perturb,1+0.5*perturb,-1-0.25*perturb)
cycles<-20
r<-PowellE1(xstart,cycles,fig=T)
##################
#### Test 2 ########
##################
perturb<-0.5
xstart<-c(-1-perturb,1+0.5*perturb,-1-0.25*perturb)
cycles<-20
r<-PowellE1(xstart,cycles,fig=T)
##################
#### Test 3 ########
##################
xstart<-c(3,2,1)
cycles<-100
r<-PowellE1(xstart,cycles,fig=T)