Paper summaryopenreviewEDML is a recent algorithm for learning the parameters of a Bayesian Network (BN). EDML works by 'removing' some edges from a Bayesian or Markov Network in order to produce a simplified network in which exact inference is feasible. EDML removes edges in a manner tailored to parameter learning, by recognizing repeated structures in the so-called "meta" network. While originally proposed for learning BN parameters, in this paper the authors show that EDML can be interpreted more generally as a coordinate ascent algorithm. After recasting EDML in this light, they then show how it can be applied to learning in Markov Networks (MNs). Finally, they provide some preliminary experimental results on learning the parameters of small grids. In doing so, they compare EDML to an off-the-shelf conjugate gradient method.
**Assessment: **
There does not appear to be much novelty here. Parameter learning in BNs and MNs inherently involves optimization of a real valued function and coordinate ascent approaches (e.g. iterative proportional fitting) have been studied previously. While casting EDML in this manner is interesting, the benefit of doing so is not made clear - why is this perspective useful? If this perspective provided insight on how to optimize a particular loss function (e.g. 0/1 loss) then it would be very helpful. Finally, in the case of incomplete data, the authors claim that EDML is both different and more efficient than EM. I don't immediately see why this is true - variational forms of EM exist and this appears to be a specific form.

EDML is a recent algorithm for learning the parameters of a Bayesian Network (BN). EDML works by 'removing' some edges from a Bayesian or Markov Network in order to produce a simplified network in which exact inference is feasible. EDML removes edges in a manner tailored to parameter learning, by recognizing repeated structures in the so-called "meta" network. While originally proposed for learning BN parameters, in this paper the authors show that EDML can be interpreted more generally as a coordinate ascent algorithm. After recasting EDML in this light, they then show how it can be applied to learning in Markov Networks (MNs). Finally, they provide some preliminary experimental results on learning the parameters of small grids. In doing so, they compare EDML to an off-the-shelf conjugate gradient method.
**Assessment: **
There does not appear to be much novelty here. Parameter learning in BNs and MNs inherently involves optimization of a real valued function and coordinate ascent approaches (e.g. iterative proportional fitting) have been studied previously. While casting EDML in this manner is interesting, the benefit of doing so is not made clear - why is this perspective useful? If this perspective provided insight on how to optimize a particular loss function (e.g. 0/1 loss) then it would be very helpful. Finally, in the case of incomplete data, the authors claim that EDML is both different and more efficient than EM. I don't immediately see why this is true - variational forms of EM exist and this appears to be a specific form.