The core idea of Empirical Likelihood (EL) is to use a maximum entropy discrete distribution supported on the data points and constrained by estimating equations related with the parameters of interest. As such, it is a non-parametric approach in the sense that the distribution of the data does not need to be specified, only some of its characteristics usually via moments. In short, it’s a non-parametric likelihood, which is fundamental for the likelihood-based statistical methodology.

Bayesian Analysis is a very popular and useful method in applications. As we discussed in the last post, it’s essentially an belief updating procedure through data, which is very natural in modeling. Last time, I said I did not get why there is a severe debate between Frequentist and Bayesian. Yesterday, I had a nice talk with Professor Xuming He from University of Michigan. When we talked about the Bayesian analysis, he made a nice point that in Frequentist analysis, the model mis-specification can be addressed in a very rigorous way to conduct valid statistical inference; while in Bayesian analysis, it is very sensitive to the likelihood as well as the prior, but how to do the adjustment is a big problem (here is a paper discussing model misspecification problems under Bayesian framework).

Before the discussion with Dr. Xuming He, intuitively, I thought it’s very natural and potentially very useful to combine empirical likelihood with Bayesian analysis by regarding empirical likelihood as the likelihood used in the Bayesian framework. But now I got to understand the importance of why Professor Nicole Lazar from University of Georgia had a paper on “Bayesian Empirical Likelihood” to discuss the validity of posterior inference: “…can likelihoods other than the density from which the data are assumed to be generated be used as the likelihood portion in a Bayesian analysis?” And the paper concluded that “…while they indicate that it is feasible to consider a Bayesian inferential procedure based on replacing the data likelihood with empirical likelihood, the validity of the posterior inference needs to be established for each case individually.”

But Professor Xuming He made a nice comment that Bayesian framework can be used to avoid the calculation of maximum empirical likelihood estimator by proving of the asymptotically normal posterior distribution with mean around the maximum empirical likelihood estimator. The original idea of their AOS paper was indeed to use the computational advantage from Bayesian side to solve the optimization difficulty in getting maximum empirical likelihood estimator. This reminded me of another paper about “Approximate Bayesian Computation (ABC) via Empirical Likelihood“, which used empirical likelihood to get improvement in the approximation at an overall computing cost that is negligible against ABC.

We know that for general Bayesian analysis, the goal is to be able to simulate from the posterior distribution by using MCMC for example. But in order to use MCMC to simulate from the posterior distribution, we need to be able to evaluate the likelihood. But sometimes, it’s hard to evaluate the likelihood due to the complexity of the model. For example, recently laundry socks problem is a hit online. Since it’s not that trivial to figure out the the likelihood of the process although we have a simple generative model from which we can easily simulate samples, Professor Rasmus Bååth presented a Bayesian analysis by using ABC. Later Professor Christian Robert presented exact ptobability calculations pointing out that Feller had posed a similar problem. And here is another post from Professor Saunak Sen. The basic idea for ABC approximation is to accept values provided the simulated sample is sufficiently close to the observed data point:

Simulate , where is the prior;

Simulate from the generative model;

If is small, keep , where is observed data point. Otherwise reject.

Now how to use empirical likelihood to help ABC? Actually although the original motivation is the same, that is to approximate the likelihood (ABC approximate the likelihood via simulation and empirical likelihood version of ABC is to use empirical likelihood to approximate the true likelihood), it’s more natural to start from the original Bayesian computation (this is also why Professor Christian Robert changed the title of their paper). For the posterior sample, we can generate as the following from the importance sampling perspective:

Simulate , where is the prior;

Get the corresponding importance weight as where is the likelihood.

Now if we do not know the likelihood, we can do the following:

Simulate , where is the prior;

Get the corresponding importance weight as where is the empirical likelihood.

This is the way of doing Bayesian computation via empirical likelihood.

The main difference between Bayesian computation via empirical likelihood and Empirical likelihood Bayesian is that the first one use empirical likelihood to approximate the likelihood in the Bayesian computation and followed by Bayesian inference, while the second one is that use Bayesian computation to overcome the optimization difficulty and followed by studying of the frequentist property.

updated[4/28/2015]: Here is a nice post talking about the issue for the Bayesian, especially for the model misspecification.

Dropbox is an efficient way to synchronize folders between various computers (Windows, Linux, Mac…). It is free up to 2Go. I use it. If you want to try and use the following link, we both get an extra 0.5Go free…

Abstract. We introduce vector diffusion maps (VDM), a new mathematical framework for organizing and analyzing massive high dimensional data sets, images and shapes. VDM is a mathematical and algorithmic generalization of di usion maps and other non-linear dimensionality reduction methods, such as LLE, ISOMAP and Laplacian eigenmaps. While existing methods are either directly or indirectly related to the heat kernel for functions over the data, VDM is based on the heat kernel for vector elds. VDM provides tools for organizing complex data sets, embedding them in a low dimensional space, and interpolating and regressing vector elds over the data. In particular, it equips the data with a metric, which we refer to as the vector diffusion distance. In the manifold learning setup, where the data set is distributed on (or near) a low dimensional manifold Md embedded in Rp, we prove the relation between VDM and the connection-Laplacian operator for vector elds over
the manifold.

D. K. Biss (Topology and its Applications 124 (2002) 355-371) introduced the topological fundamental group and presented some interesting basic properties of the notion. In this article we intend to extend the above notion to homotopy groups and try to prove some similar basic properties of the topological homotopy groups. We also study more on the topology of the topological homotopy groups in order to find necessary and sufficient conditions for which the topology is discrete. Moreover, we show that studying topological homotopy groups may be more useful than topological fundamental groups.

This paper describes the structure of the moduli space of holomorphic curves and constructs Gromov Witten invariants in the category of exploded manifolds. This includes defining Gromov Witten invariants relative to normal crossing divisors and proving the associated gluing theorem which involves summing relative invariants over a count of tropical curves.

These are lecture notes that arose from a representation theory course given by the first author to the remaining six authors in March 2004 within the framework of the Clay Mathematics Institute Research Academy for high school students, and its extended version given by the first author to MIT undergraduate math students in the Fall of 2008. The notes cover a number of standard topics in representation theory of groups, Lie algebras, and quivers, and contain many problems and exercises. They should be accessible to students with a strong background in linear algebra and a basic knowledge of abstract algebra, and may be used for an undergraduate or introductory graduate course in representation theory.

ps:In the latest version, misprints and errors were corrected and new exercises were added, in particular ones suggested by Darij Grinberg

It is argued that zero should be considered as a cardinal number but not an ordinal number. One should make a clear distinction between order types that are labels for well-ordered sets and ordinal numbers that are labels for the elements in these sets.