In the past decades, exactly recovering the intrinsic data structure from corrupted observations, which is known as Robust Principal Component Analysis (RPCA), has attracted tremendous interests and found many applications in computer vision and pattern recognition. Recently, this problem has been formulated as recovering a low-rank component and a sparse component from the observed data matrix. It is proved that under some suitable conditions, this problem can be exactly solved by Principal Component Pursuit (PCP), i.e., minimizing a combination of nuclear norm and ℓ1 norm. Most of the existing methods for solving PCP require Singular Value Decompositions (SVDs) of the data matrix, resulting in a high computational complexity, hence preventing the applications of RPCA to very large scale computer vision problems. In this paper, we propose a novel algorithm, called ℓ1 filtering, for exactly solving PCP with an Oðr2ðmþnÞÞ complexity, where m n is the size of data matrix and r is the rank of the matrix to recover, which is supposed to be much smaller than m and n. Moreover, ℓ1 filtering is highly parallelizable. It is the first algorithm that can exactly solve a nuclear norm minimization problem in linear time (with respect to the data size). As a preliminary investigation, we also discuss the potential extensions of PCP for more complex vision tasks encouraged by ℓ1 filtering. Experiments on both synthetic data and real tasks testify the great advantage of ℓ1 filtering in speed over state-of-theart algorithms and wide applications in computer vision and pattern recognition societies.

A new technique, which can compensate for the lack of channel bandwidth in an optical wireless orthogonal frequency division multiplexing (OFDM) link based on a light emitting diode (LED), is proposed. It uses an adaptive sampling and an inverse discrete cosine transform in order to convert an OFDM signal into a sparse waveform so that not only is the important data obtained efficiently but the redundancy one is removed. In compressive sensing (CS), a sparse signal that is sampled below the Nyquist/Shannon limit can be reconstructed successively with enough measurement. This means that the CS technique can increase the data rate of visible light communication (VLC) systems based on LEDs. It is observed that the data rate of the proposed CS-based VLC-OFDM link can be made 1.7 times greater than a conventional VLC-OFDM link (from 30.72 Mb/s to 51.2 Mb/s). We see that the error vector magnitude (EVM) of the quadrature phase shift keying (QPSK) symbol is 31% (FEC limit: EVM of 32%) at a compression ratio of 40%.

Respiratory motion correction remains a challenge in coronary magnetic resonance imaging (MRI) and current techniques, such as navigator gating, suffer from sub-optimal scan efficiency and ease-of-use. To overcome these limitations, an image-based self-navigation technique is proposed that uses “sub-images” and compressed sensing (CS) to obtain translational motion correction in 2D. The method was preliminarily implemented as a 2D technique and tested for feasibility for targeted coronary imaging.

Methods

During a 2D segmented radial k-space data acquisition, heavily undersampled sub-images were reconstructed from the readouts collected during each cardiac cycle. These sub-images may then be used for respiratory self-navigation. Alternatively, a CS reconstruction may be used to create these sub-images, so as to partially compensate for the heavy undersampling. Both approaches were quantitatively assessed using simulations and in vivo studies, and the resulting self-navigation strategies were then compared to conventional navigator gating.

Results

Sub-images reconstructed using CS showed a lower artifact level than sub-images reconstructed without CS. As a result, the final image quality was significantly better when using CS-assisted self-navigation as opposed to the non-CS approach. Moreover, while both self-navigation techniques led to a 69% scan time reduction (as compared to navigator gating), there was no significant difference in image quality between the CS-assisted self-navigation technique and conventional navigator gating, despite the significant decrease in scan time.

Conclusions

CS-assisted self-navigation using 2D translational motion correction demonstrated feasibility of producing coronary MRA data with image quality comparable to that obtained with conventional navigator gating, and does so without the use of additional acquisitions or motion modeling, while still allowing for 100% scan efficiency and an improved ease-of-use. In conclusion, compressed sensing may become a critical adjunct for 2D translational motion correction in free-breathing cardiac imaging with high spatial resolution. An expansion to modern 3D approaches is now warranted.

Energy and direction are tow basic properties of a vector. A discrete signal is a vector in nature. RIP of compressive sensing can not show the direction information of a signal but show the energy information of a signal. Hence, RIP is not complete. Orthogonal matrices can preserve angles and lengths. Preservation of length can show energies of signals like RIP do; and preservation of angle can show directions of signals. Therefore, Restricted Conformal Property (RCP) is proposed according to preservation of angle. RCP can show the direction of a signal just as RIP shows the energy of a signal. RCP is important supplement and development of RIP. Tow different proofs of RCP are given, namely, RCP_JL and RCP_IP.

Bayesian methods for low-rank matrix completion with noise have been shown to be very efficient computationally. While the behaviour of penalized minimization methods is well understood both from the theoretical and computational points of view in this problem, the theoretical optimality of Bayesian estimators have not been explored yet. In this paper, we propose a Bayesian estimator for matrix completion under general sampling distribution. We also provide an oracle inequality for this estimator. This inequality proves that, whatever the rank of the matrix to be estimated, our estimator reaches the minimax-optimal rate of convergence (up to a logarithmic factor). We end the paper with a short simulation study.

The task of estimating a matrix given a sample of observed entries is known as the \emph{matrix completion problem}. Most works on matrix completion have focused on recovering an unknown real-valued low-rank matrix from a random sample of its entries. Here, we investigate the case of highly quantized observations when the measurements can take only a small number of values. These quantized outputs are generated according to a probability distribution parametrized by the unknown matrix of interest. This model corresponds, for example, to ratings in recommender systems or labels in multi-class classification. We consider a general, non-uniform, sampling scheme and give theoretical guarantees on the performance of a constrained, nuclear norm penalized maximum likelihood estimator. One important advantage of this estimator is that it does not require knowledge of the rank or an upper bound on the nuclear norm of the unknown matrix and, thus, it is adaptive. We provide lower bounds showing that our estimator is minimax optimal. An efficient algorithm based on lifted coordinate gradient descent is proposed to compute the estimator. A limited Monte-Carlo experiment, using both simulated and real data is provided to support our claims.

We present a smoothing technique which allows for the use of gradient based methods (such as steepest descent and conjugate gradients) for non-smooth regularization of inverse problems. As an application of this technique, we consider the problem of finding regularized solutions of linear systems $Ax = b$ with sparsity constraints. Such problems involve the minimization of a functional with an absolute value term, which is not smooth. We replace the non-smooth term by a smooth approximation, computed via a convolution. We are then able to compute gradients and Hessians, and utilize standard gradient based methods which yield good numerical performance in few iterations.

We obtain nonasymptotic bounds on the spectral norm of random matrices with independent entries that improve significantly on earlier results. If $X$ is the $n\times n$ symmetric matrix with $X_{ij}\sim N(0,b_{ij}^2)$, we show that $\mathbf{E}\|X\|\lesssim \max_i\sqrt{\sum_{j}b_{ij}^2} +\max_{ij}|b_{ij}|\sqrt{\log n}. $ This bound is optimal in the sense that a matching lower bound holds under mild assumptions, and the constants are sufficiently sharp that we can often capture the precise edge of the spectrum. Analogous results are obtained for rectangular matrices and for more general subgaussian or heavy-tailed distributions of the entries, and we derive tail bounds in addition to bounds on the expected norm. The proofs are based on a combination of the moment method and geometric functional analysis techniques. As an application, we show that our bounds immediately yield the correct phase transition behavior of the spectral edge of random band matrices and of sparse Wigner matrices. We also recover a result of Seginer on the norm of Rademacher matrices.

Various algorithms have been proposed for dictionary learning. Among those for image processing, many use image patches to form dictionaries. This paper focuses on whole-image recovery from corrupted linear measurements. We address the open issue of representing an image by overlapping patches: the overlapping leads to an excessive number of dictionary coefficients to determine. With very few exceptions, this issue has limited the applications of image-patch methods to the local kind of tasks such as denoising, inpainting, cartoon-texture decomposition, super-resolution, and image deblurring, for which one can process a few patches at a time. Our focus is global imaging tasks such as compressive sensing and medical image recovery, where the whole image is encoded together, making it either impossible or very ineffective to update a few patches at a time.
Our strategy is to divide the sparse recovery into multiple subproblems, each of which handles a subset of non-overlapping patches, and then the results of the subproblems are averaged to yield the final recovery. This simple strategy is surprisingly effective in terms of both quality and speed. In addition, we accelerate computation of the learned dictionary by applying a recent block proximal-gradient method, which not only has a lower per-iteration complexity but also takes fewer iterations to converge, compared to the current state-of-the-art. We also establish that our algorithm globally converges to a stationary point. Numerical results on synthetic data demonstrate that our algorithm can recover a more faithful dictionary than two state-of-the-art methods.
Combining our whole-image recovery and dictionary-learning methods, we numerically simulate image inpainting, compressive sensing recovery, and deblurring. Our recovery is more faithful than those of a total variation method and a method based on overlapping patches.

For civil structures, structural damage due to severe loading events such as earthquakes, or due to long-term environmental degradation, usually occurs in localized areas of a structure. A new sparse Bayesian probabilistic framework for computing the probability of localized stiffness reductions induced by damage is presented that uses noisy incomplete modal data from before and after possible damage. This new approach employs system modal parameters of the structure as extra variables for Bayesian model updating with incomplete modal data. A specific hierarchical Bayesian model is constructed that promotes spatial sparseness in the inferred stiffness reductions in a way that is consistent with the Bayesian Ockham razor. To obtain the most plausible model of sparse stiffness reductions together with its uncertainty within a specified class of models, the method employs an optimization scheme that iterates among all uncertain parameters, including the hierarchical hyper-parameters. The approach has four important benefits: (1) it infers spatially-sparse stiffness changes based on the identified modal parameters; (2) the uncertainty in the inferred stiffness reductions is quantified; (3) no matching of model and experimental modes is needed, and (4) solving the nonlinear eigenvalue problem of a structural model is not required. The proposed method is applied to two previously-studied examples using simulated data: a ten-story shear-building and the three-dimensional braced-frame model from the Phase II Simulated Benchmark problem sponsored by the IASC-ASCE Task Group on Structural Health Monitoring. The results show that the occurrence of false-positive and false-negative damage detection is clearly reduced in the presence of modeling error. Furthermore, the identified most probable stiffness loss ratios are close to their actual values.

One of the challenges in Big Data is efficient handling of high-dimensional data or signals. This paper proposes a novel AMP algorithm for solving high-dimensional linear systems $\underline Y = {\mathbf{H}}\underline X + \underline W \in \mathbb{R}^M$ which has a piecewise-constant solution $\underline X \in \mathbb{R}^N$, under a compressed sensing framework $(M\leq N)$. We refer to the proposed AMP as \emph{ssAMP}. This ssAMP algorithm is derived from the classical message-passing rule over a bipartite graph which includes spike-and-slab potential functions to encourage the piecewise-constant nature of $\underline X$. The ssAMP iteration includes a novel scalarwise denoiser satisfying the Lipschitz continuity, generating an approximate MMSE estimate of the signal. The Lipschitz continuity of our denoiser enables the ssAMP to use the state evolution framework, given by the works [16],[20], for MSE prediction. In addition, we empirically show that ssAMP has better phase transition characteristic than TV-AMP [23] and GrAMPA [27] which are the existing AMPs for piecewise-constant recovery. We also discuss computational efficiency, empirically showing that ssAMP has computational advantage over the other recent algorithms under a high-dimensional setting.

This technical note considers the reconstruction of discrete-time nonlinear systems with additive noise. In particular, we propose a method and its associated algorithm to identify the system nonlinear functional forms and their associated parameters from a limited number of noisy time-series data. For this, we cast this reconstruction problem as a sparse linear regression problem and take a Bayesian viewpoint to solve it. As such, this approach typically leads to nonconvex optimisations. We propose a convexification procedure relying on an efficient iterative reweighted 1 -minimisation algorithm that uses general sparsity inducing priors on the parameters of the system and marginal likelihood maximisation. Using this approach, we also show how convex constraints on the parameters can be easily added to our proposed iterative reweighted 1 -minimisation algorithm. In the supplementary material [1], we illustrate the effectiveness of the proposed reconstruction method on two classical systems in biology and physics, namely, a genetic repressilator network and a large scale network of interconnected Kuramoto oscillators.

This paper establishes new concentration inequalities for random matrices constructed from independent random variables. These results are analogous with the generalized Efron-Stein inequalities developed by Boucheron et al. The proofs rely on the method of exchangeable pairs.

Deeply rooted in classical social choice and voting theory, statistical ranking with paired comparison data experienced its renaissance with the wide spread of crowdsourcing technique. As the data quality might be significantly damaged in an uncontrolled crowdsourcing environment, outlier detection and robust ranking have become a hot topic in such data analysis. In this paper, we propose a robust ranking framework based on the principle of Huber's robust statistics, which formulates outlier detection as a LASSO problem to find sparse approximations of the cyclic ranking projection in Hodge decomposition. Moreover, simple yet scalable algorithms are developed based on Linearized Bregman Iteration to achieve an even less biased estimator than LASSO. Statistical consistency of outlier detection is established in both cases which states that when the outliers are strong enough and in Erdos-Renyi random graph sampling settings, outliers can be faithfully detected. Our studies are supported by experiments with both simulated examples and real-world data. The proposed framework provides us a promising tool for robust ranking with large scale crowdsourcing data arising from computer vision, multimedia, machine learning, sociology, etc.

In the present paper we consider application of flexible, overcomplete dictionaries to solution of general ill-posed linear inverse problems. Construction of an adaptive optimal solution for problems of this sort usually relies either on a singular value decomposition or representation of the solution via some orthonormal basis. The shortcoming of both approaches lies in the fact that, in many situations, neither the eigenbasis of the linear operator nor a standard orthonormal basis constitutes an appropriate collection of functions for sparse representation of f. In the context of regression problems, there have been enormous amount of effort to recover an unknown function using a flexible, overcomplete dictionary. One of the most popular methods, Lasso and its versions, is based on minimizing empirical likelihood and, unfortunately, requires stringent assumptions on the dictionary, the, so called, compatibility conditions. While these conditions may be satisfied for the functions in the original dictionary, they usually do not hold for their images due to contraction imposed by the linear operator. In the paper, we bypass this difficulty by a novel approach which is based on inverting each of the dictionary functions and matching the resulting expansion to the true function rather than minimizing the empirical likelihood, thus, avoiding unrealistic assumptions on the dictionary. We show how the suggested methodology can be extended to the problem of estimation of a mixing density in a continuous mixture. We also suggest the solution which utilizes structured and unstructured random dictionaries, the technique that have not been applied so far to the solution of ill-posed linear inverse problems. We put a solid theoretical foundation under the suggested methodology and study its performance via simulations that confirm good computational properties of the method.

We formulate an affine invariant implementation of the algorithm in Nesterov (1983). We show that the complexity bound is then proportional to an affine invariant regularity constant defined with respect to the Minkowski gauge of the feasible set. We also detail matching lower bounds when the feasible set is an ℓp ball. In this setting, our bounds on iteration complexity for the algorithm in Nesterov (1983) are thus optimal in terms of target precision, smoothness and problem dimension.

This paper considers the recovery of sparse control inputs to an overactuated linear system, i.e., one in which the number of inputs greatly outnumbers the number of states. Such a system is relevant for certain models of biological systems such as neuronal networks. In general, the overactuated formulation leads to an underdetermined recovery problem, such that it is impossible to exactly infer the system inputs based on state observation alone. We show, however, that under assumptions of input sparsity, it is possible to perform exact and stable recovery over a finite time-horizon and develop an error bound for the reconstructed control input. The solution methodology involves l1-based regularization, commonplace is sparse recovery problems, but here extended to the case of a linear dynamical system evolving over a time horizon. Simulation results are presented to verify the solution and performance bounds.

In the family of unit balls with constant volume we look at the ones whose algebraic representation has some extremal property. We consider the family of nonnegative homogeneous polynomials of even degree $d$ whose sublevel set $\G=\{\x: g(\x)\leq 1\}$ (a unit ball) has same fixed volume and want to find in this family the one that minimizes either the $\ell_1$-norm or the $\ell_2$-norm of its vector of coefficients. Equivalently, among all degree-$d$ polynomials of constant $\ell_1-$ or $\ell_2$-norm, which one minimizes the volume of its level set $\G$. We first show that in both cases this is a convex optimization problem with a unique optimal solution $g^*_1$ and $g^*_2$ respectively. We also show that $g^*_1$ is the $L_p$-norm polynomial $\x\mapsto\sum_{i=1}^n x_i^{p}$, thus recovering a parsimony property of the $L_p$-norm via $\ell_1$-norm minimization. (Indeed $n=\Vert g^*_1\Vert_0$ is the minimum number of non-zero coefficient for $\G$ to have finite volume.) This once again illustrates the power and versatility of the $\ell_1$-norm relaxation strategy in optimization when one searches for an optimal solution with parsimony properties. Next we show that $g^*_2$ is not sparse at all (and so differs from $g^*_1$) but is still a sum of $p$-powers of linear forms. We also characterize the unique optimal solution of the same problem where one searches for an SOS homogeneous polynomial that minimizes the trace of its associated (psd) Gram matrix, hence aiming at finding a solution which is a sum of a few squares only. Finally, we also extend these results to generalized homogeneous polynomials, which includes $L_p$-norms when $0

In this paper, we present a novel affine-invariant feature based on SIFT, leveraging the regular appearance of man-made objects. The feature achieves full affine invariance without needing to simulate over affine parameter space. Low-rank SIFT, as we name the feature, is based on our observation that local tilt, which are caused by changes of camera axis orientation, could be normalized by converting local patches to standard low-rank forms. Rotation, translation and scaling invariance could be achieved in ways similar to SIFT. As an extension of SIFT, our method seeks to add prior to solve the ill-posed affine parameter estimation problem and normalizes them directly, and is applicable to objects with regular structures. Furthermore, owing to recent breakthrough in convex optimization, such parameter could be computed efficiently. We will demonstrate its effectiveness in place recognition as our major application. As extra contributions, we also describe our pipeline of constructing geotagged building database from the ground up, as well as an efficient scheme for automatic feature selection.

Deeply rooted in classical social choice and voting theory, statistical ranking with paired comparison data experienced its renaissance with the wide spread of crowdsourcing technique. As the data quality might be significantly damaged in an uncontrolled crowdsourcing environment, outlier detection and robust ranking have become a hot topic in such data analysis. In this paper, we propose a robust ranking framework based on the principle of Huber's robust statistics, which formulates outlier detection as a LASSO problem to find sparse approximations of the cyclic ranking projection in Hodge decomposition. Moreover, simple yet scalable algorithms are developed based on Linearized Bregman Iteration to achieve an even less biased estimator than LASSO. Statistical consistency of outlier detection is established in both cases which states that when the outliers are strong enough and in Erdos-Renyi random graph sampling settings, outliers can be faithfully detected. Our studies are supported by experiments with both simulated examples and real-world data. The proposed framework provides us a promising tool for robust ranking with large scale crowdsourcing data arising from computer vision, multimedia, machine learning, sociology, etc.

Compressed sensing (CS) enables people to acquire the compressed measurements directly and recover sparse or compressible signals faithfully even when the sampling rate is much lower than the Nyquist rate. However, the pure random sensing matrices usually require huge memory for storage and high computational cost for signal reconstruction. Many structured sensing matrices have been proposed recently to simplify the sensing scheme and the hardware implementation in practice. Based on the restricted isometry property and coherence, couples of existing structured sensing matrices are reviewed in this paper, which have special structures, high recovery performance, and many advantages such as the simple construction, fast calculation and easy hardware implementation. The number of measurements and the universality of different structure matrices are compared.

We study the use of very sparse random projections for compressed sensing (sparse signal recovery) when the signal entries can be either positive or negative. In our setting, the entries of a Gaussian design matrix are randomly sparsified so that only a very small fraction of the entries are nonzero. Our proposed decoding algorithm is simple and efficient in that the major cost is one linear scan of the coordinates. We have developed two estimators: (i) the {\em tie estimator}, and (ii) the {\em absolute minimum estimator}. Using only the tie estimator, we are able to recover a $K$-sparse signal of length $N$ using $1.551 eK \log K/\delta$ measurements (where $\delta\leq 0.05$ is the confidence). Using only the absolute minimum estimator, we can detect the support of the signal using $eK\log N/\delta$ measurements. For a particular coordinate, the absolute minimum estimator requires fewer measurements (i.e., with a constant $e$ instead of $1.551e$). Thus, the two estimators can be combined to form an even more practical decoding framework.

Prior studies have shown that existing one-scan (or roughly one-scan) recovery algorithms using sparse matrices would require substantially more (e.g., one order of magnitude) measurements than L1 decoding by linear programming, when the nonzero entries of signals can be either negative or positive. In this paper, following a known experimental setup, we show that, at the same number of measurements, the recovery accuracies of our proposed method are (at least) similar to the standard L1 decoding.

Super-resolution is a natural mathematical abstraction for the problem of extracting fine-grained structure from coarse-grained measurements, and has received considerable attention following the pioneering works of Donoho and Candes and Fernandez-Granda. Here we introduce new techniques based on extremal functions for studying this and related problems and we exactly resolve the threshold at which noisy super-resolution is possible. In particular, we establish a sharp phase transition for the relationship between the cutoff frequency ($m$) and the separation ($\Delta$). If $m > 1/\Delta + 1$, our estimator converges to the true values at an inverse polynomial rate in terms of the magnitude of the noise. And when $m < (1-\epsilon)/ \Delta$ no estimator can distinguish between a particular pair of $\Delta$-separated signals even if the magnitude of the noise is exponentially small. Our results involve making novel connections between extremal functions and spectral properties of the Vandermonde matrix, such as bounding its condition number as well as constructing explicit preconditioners for it.

The problem of finding the missing values of a matrix given a few of its entries, called matrix completion, has gathered a lot of attention in the recent years. Although the problem is NP-hard, Cand\`es and Recht showed that it can be exactly relaxed if the matrix is low-rank and the number of observed entries is sufficiently large. In this work, we introduce a novel matrix completion model that makes use of proximity information about rows and columns by assuming they form communities. This assumption makes sense in several real-world problems like in recommender systems, where there are communities of people sharing preferences, while products form clusters that receive similar ratings. Our main goal is thus to find a low-rank solution that is structured by the proximities of rows and columns encoded by graphs. We borrow ideas from manifold learning to constrain our solution to be smooth on these graphs, in order to implicitly force row and column proximities. Our matrix recovery model is formulated as a convex non-smooth optimization problem, for which a well-posed iterative scheme is provided. We study and evaluate the proposed matrix completion on synthetic and real data, showing that the proposed structured low-rank recovery model outperforms the standard matrix completion model in many situations.

In parallel magnetic resonance imaging (pMRI), to find a joint solution for the image and coil sensitivity functions is a nonlinear and nonconvex problem. A class of algorithms reconstruct sensitivity encoded images of the coils first followed by the magnitude only image reconstruction, e.g. GRAPPA. It is shown in this paper that, if only the magnitude image is reconstructed, there exists a convex solution space for the magnitude image and sensitivity encoded images. This solution space enables formulation of a regularized convex optimization problem and leads to a globally optimal and unique solution for the magnitude image reconstruction. Its applications to in-vivo MRI data sets result in superior reconstruction performance compared with other algorithms.

In recent studies on sparse modeling, $l_q$ ($0q1$) regularization has received considerable attention due to its superiorities on sparsity-inducing and bias reduction over the $l_1$ regularization.In this paper, we propose a cyclic coordinate descent (CCD) algorithm for $l_q$ regularization. Our main result states that the CCD algorithm converges globally to a stationary point as long as the stepsize is less than a positive constant. Furthermore, we demonstrate that the CCD algorithm converges to a local minimizer under certain additional conditions. Our numerical experiments demonstrate the efficiency of the CCD algorithm.

The Nystrom method is an efficient technique used to speed up large-scale learning applications by generating low-rank approximations. Crucial to the performance of this technique is the assumption that a matrix can be well approximated by working exclusively with a subset of its columns. In this work we relate this assumption to the concept of matrix coherence, connecting coherence to the performance of the Nystrom method. Making use of related work in the compressed sensing and the matrix completion literature, we derive novel coherence-based bounds for the Nystrom method in the low-rank setting. We then present empirical results that corroborate these theoretical bounds. Finally, we present more general empirical results for the full-rank setting that convincingly demonstrate the ability of matrix coherence to measure the degree to which information can be extracted from a subset of columns.

In many applications that require matrix solutions of minimal rank, the underlying cost function is non-convex leading to an intractable, NP-hard optimization problem. Consequently, the convex nuclear norm is frequently used as a surrogate penalty term for matrix rank. The problem is that in many practical scenarios there is no longer any guarantee that we can correctly estimate generative low-rank matrices of interest, theoretical special cases notwithstanding. Consequently, this paper proposes an alternative empirical Bayesian procedure build upon a variational approximation that, unlike the nuclear norm, retains the same globally minimizing point estimate as the rank function under many useful constraints. However, locally minimizing solutions are largely smoothed away via marginalization, allowing the algorithm to succeed when standard convex relaxations completely fail. While the proposed methodology is generally applicable to a wide range of low-rank applications, we focus our attention on the robust principal component analysis problem (RPCA), which involves estimating an unknown low-rank matrix with unknown sparse corruptions. Theoretical and empirical evidence are presented to show that our method is potentially superior to related MAP-based approaches, for which the convex principle component pursuit (PCP) algorithm (Candes et al., 2011) can be viewed as a special case.

We develop a general framework for proving rigorous guarantees on the performance of the EM algorithm and a variant known as gradient EM. Our analysis is divided into two parts: a treatment of these algorithms at the population level (in the limit of infinite data), followed by results that apply to updates based on a finite set of samples. First, we characterize the domain of attraction of any global maximizer of the population likelihood. This characterization is based on a novel view of the EM updates as a perturbed form of likelihood ascent, or in parallel, of the gradient EM updates as a perturbed form of standard gradient ascent. Leveraging this characterization, we then provide non-asymptotic guarantees on the EM and gradient EM algorithms when applied to a finite set of samples. We develop consequences of our general theory for three canonical examples of incomplete-data problems: mixture of Gaussians, mixture of regressions, and linear regression with covariates missing completely at random. In each case, our theory guarantees that with a suitable initialization, a relatively small number of EM (or gradient EM) steps will yield (with high probability) an estimate that is within statistical error of the MLE. We provide simulations to confirm this theoretically predicted behavior.

Despite the fact the required number of samples for reconstructing a signal can be greatly reduced if it is sparse in a known basis, many real world signals are however sparse in an unknown and continuous dictionary. One example is the spectrally-sparse signal, which is composed of a small number of spectral atoms with arbitrary frequencies on the unit interval. In this paper we study the problem of denoising and estimating an ensemble of spectrally-sparse signals from their partial and noisy observations, and simultaneously recovering the set of signals if necessary.

Two approaches are developed based on atomic norm minimization and structured matrix completion, both of which can be solved efficiently via semidefinite programming. The first approach aims to estimate and denoise the set of signals from their partial and noisy observations via atomic norm minimization, and recover the frequencies via examining the dual polynomial of the convex program. We characterize the optimality condition of the proposed algorithm and derive the expected convergence rate for denoising, demonstrating the benefit of including multiple measurement vectors. The second approach aims to recover the population covariance matrix from the partially observed sample covariance matrix by motivating its low-rank Toeplitz structure without recovering the signal ensemble. The frequencies can be recovered via conventional spectrum estimation methods such as MUSIC from the estimated covariance matrix. Performance guarantee is derived with a finite number of measurement vectors. Finally, numerical examples are provided to validate the performance of the proposed algorithms, with comparisons against several existing approaches.

We propose imposing box constraints on the individual elements of the unknown matrix in the matrix completion problem and present a number of natural applications, ranging from collaborative filtering under interval uncertainty to computer vision. Moreover, we design an alternating direction parallel coordinate descent method (MACO) for a smooth unconstrained optimization reformulation of the problem. In large scale numerical experiments in collaborative filtering under uncertainty, our method obtains solution with considerably smaller errors compared to classical matrix completion with equalities. We show that, surprisingly, seemingly obvious and trivial inequality constraints, when added to the formulation, can have a large impact. This is demonstrated on a number of machine learning problems.

The probabilistic analysis of condition numbers has traditionally been approached from different angles; one is based on Smale's program in complexity theory and features integral geometry, while the other is motivated by geometric functional analysis and makes use of the theory of Gaussian processes. In this note we explore connections between the two approaches in the context of the biconic homogeneous feasiblity problem and the condition numbers motivated by conic optimization theory. Key tools in the analysis are Slepian's and Gordon's comparision inequalities for Gaussian processes, interpreted as monotonicity properties of moment functionals, and their interplay with ideas from conic integral geometry.

What can we learn from the collective dynamics of a complex network about its interaction topology? Taking the perspective from nonlinear dynamics, we briefly review recent progress on how to infer structural connectivity (direct interactions) from accessing the dynamics of the units. Potential applications range from interaction networks in physics, to chemical and metabolic reactions, protein and gene regulatory networks as well as neural circuits in biology and electric power grids or wireless sensor networks in engineering. Moreover, we briefly mention some standard ways of inferring effective or functional connectivity.

Hyperspectral images contain mixed pixels due to low spatial resolution of hyperspectral sensors. Spectral unmixing problem refers to decomposing mixed pixels into a set of endmembers and abundance fractions. Due to nonnegativity constraint on abundance fractions, nonnegative matrix factorization (NMF) methods have been widely used for solving spectral unmixing problem. In this letter we proposed using multilayer NMF (MLNMF) for the purpose of hyperspectral unmixing. In this approach, spectral signature matrix can be modeled as a product of sparse matrices. In fact MLNMF decomposes the observation matrix iteratively in a number of layers. In each layer, we applied sparseness constraint on spectral signature matrix as well as on abundance fractions matrix. In this way signatures matrix can be sparsely decomposed despite the fact that it is not generally a sparse matrix. The proposed algorithm is applied on synthetic and real datasets. Synthetic data is generated based on endmembers from USGS spectral library. AVIRIS Cuprite dataset has been used as a real dataset for evaluation of proposed method. Results of experiments are quantified based on SAD and AAD measures. Results in comparison with previously proposed methods show that the multilayer approach can unmix data more effectively.

A classical problem in matrix computations is the efficient and reliable approximation of a given matrix by a matrix of lower rank. The truncated singular value decomposition (SVD) is known to provide the best such approximation for any given fixed rank. However, the SVD is also known to be very costly to compute. Among the different approaches in the literature for computing low-rank approximations, randomized algorithms have attracted researchers' recent attention due to their surprising reliability and computational efficiency in different application areas. Typically, such algorithms are shown to compute with very high probability low-rank approximations that are within a constant factor from optimal, and are known to perform even better in many practical situations. In this paper, we present a novel error analysis that considers randomized algorithms within the subspace iteration framework and show with very high probability that highly accurate low-rank approximations as well as singular values can indeed be computed quickly for matrices with rapidly decaying singular values. Such matrices appear frequently in diverse application areas such as data analysis, fast structured matrix computations and fast direct methods for large sparse linear systems of equations and are the driving motivation for randomized methods. Furthermore, we show that the low-rank approximations computed by these randomized algorithms are actually rank-revealing approximations, and the special case of a rank-1 approximation can also be used to correctly estimate matrix 2-norms with very high probability. Our numerical experiments are in full support of our conclusions.

It is often the case that, within an online recommender system, multiple users share a common account. Can such shared accounts be identified solely on the basis of the userprovided ratings? Once a shared account is identified, can the different users sharing it be identified as well? Whenever such user identification is feasible, it opens the way to possible improvements in personalized recommendations, but also raises privacy concerns. We develop a model for composite accounts based on unions of linear subspaces, and use subspace clustering for carrying out the identification task. We show that a significant fraction of such accounts is identifiable in a reliable manner, and illustrate potential uses for personalized recommendation.

Probabilistic matrix factorization (PMF) is a powerful method for modeling data associ- ated with pairwise relationships, Finding use in collaborative Filtering, computational bi- ology, and document analysis, among other areas. In many domains, there are additional covariates that can assist in prediction. For example, when modeling movie ratings, we might know when the rating occurred, where the user lives, or what actors appear in the movie. It is difficult, however, to incorporate this side information into the PMF model. We propose a framework for incorporating side information by coupling together multi- ple PMF problems via Gaussian process priors. We replace scalar latent features with func- tions that vary over the covariate space. The GP priors on these functions require them to vary smoothly and share information. We apply this new method to predict the scores of professional basketball games, where side information about the venue and date of the game are relevant for the outcome.

With advances in data collection technologies, tensor data is assuming increasing prominence in many applications and the problem of supervised tensor learning has emerged as a topic of critical significance in the data mining and machine learning community. Conventional methods for supervised tensor learning mainly focus on learning kernels by flattening the tensor into vectors or matrices, however structural information within the tensors will be lost. In this paper, we introduce a new scheme to design structure-preserving kernels for supervised tensor learning. Specifically, we demonstrate how to leverage the naturally available structure within the tensorial representation to encode prior knowledge in the kernel. We proposed a tensor kernel that can preserve tensor structures based upon dual-tensorial mapping. The dual-tensorial mapping function can map each tensor instance in the input space to another tensor in the feature space while preserving the tensorial structure. Theoretically, our approach is an extension of the conventional kernels in the vector space to tensor space. We applied our novel kernel in conjunction with SVM to real-world tensor classification problems including brain fMRI classification for three different diseases (i.e., Alzheimer's disease, ADHD and brain damage by HIV). Extensive empirical studies demonstrate that our proposed approach can effectively boost tensor classification performances, particularly with small sample sizes.

Non-negative blind source separation (non-negative BSS), which is also referred to as non-negative matrix factorization (NMF), is a very active field in domains as different as astrophysics, audio processing or biomedical signal processing. In this context, the efficient retrieval of the sources requires the use of signal priors such as sparsity. If NMF has now been well studied with sparse constraints in the direct domain, only very few algorithms can encompass non-negativity together with sparsity in a transformed domain since simultaneously dealing with two priors in two different domains is challenging. In this article, we show how a sparse NMF algorithm coined non-negative generalized morphological component analysis (nGMCA) can be extended to impose non-negativity in the direct domain along with sparsity in a transformed domain, with both analysis and synthesis formulations. To our knowledge, this work presents the first comparison of analysis and synthesis priors ---as well as their reweighted versions--- in the context of blind source separation. Comparisons with state-of-the-art NMF algorithms on realistic data show the efficiency as well as the robustness of the proposed algorithms.

This thesis proposes spatio-spectral techniques for hyperspectral image analysis. Adaptive spatio-spectral support and variable exposure hyperspectral imaging is demonstrated to improve spectral reflectance recovery from hyperspectral images. Novel spectral dimensionality reduction techniques have been proposed from the perspective of spectral only and spatio-spectral information preservation. It was found that the joint sparse and joint group sparse hyperspectral image models achieve lower reconstruction error and higher recognition accuracy using only a small subset of bands. Hyperspectral image databases have been developed and made publicly available for further research in compressed hyperspectral imaging, forensic document analysis and spectral reflectance recovery.

Finding an informative subset of a large number of data points or models is at the center of many problems in machine learning, computer vision, bio/health informatics and image/signal processing. Given pairwise dissimilarities between the elements of a `source set' and a `target set,' we consider the problem of finding a subset of the source set, called representatives or exemplars, that can efficiently describe the target set. We formulate the problem as a row-sparsity regularized trace minimization problem. Since the proposed formulation is, in general, an NP-hard problem, we consider a convex relaxation. The solution of our proposed optimization program finds the representatives and the probability that each element of the target set is associated with the representatives. We analyze the solution of our proposed optimization as a function of the regularization parameter. We show that when the two sets jointly partition into multiple groups, the solution of our proposed optimization program finds representatives from all groups and reveals clustering of the sets. In addition, we show that our proposed formulation can effectively deal with outliers. Our algorithm works with arbitrary dissimilarities, which can be asymmetric or violate the triangle inequality. To efficiently implement our proposed algorithm, we consider an Alternating Direction Method of Multipliers (ADMM) framework, which results in quadratic complexity in the problem size. We show that the ADMM implementation allows to parallelize the algorithm, hence further reducing the computational cost. Finally, by experiments on real-world datasets, we show that our proposed algorithm improves the state of the art on the two problems of scene categorization using representative images and time-series modeling and segmentation using representative models.

In context of document classification, where in a corpus of documents their label tags are readily known, an opportunity lies in utilizing label information to learn document representation spaces with better discriminative properties. To this end, in this paper application of a Variational Bayesian Supervised Nonnegative Matrix Factorization (supervised vbNMF) with label-driven sparsity structure of coefficients is proposed for learning of discriminative nonsubtractive latent semantic components occuring in TF-IDF document representations. Constraints are such that the components pursued are made to be frequently occuring in a small set of labels only, making it possible to yield document representations with distinctive label-specific sparse activation patterns. A simple measure of quality of this kind of sparsity structure, dubbed inter-label sparsity, is introduced and experimentally brought into tight connection with classification performance. Representing a great practical convenience, inter-label sparsity is shown to be easily controlled in supervised vbNMF by a single parameter.

It is well known that good initializations can improve the speed and accuracy of the solutions of many nonnegative matrix factorization (NMF) algorithms. Many NMF algorithms are sensitive with respect to the initialization of W or H or both. This is especially true of algorithms of the alternating least squares (ALS) type, including the two new ALS algorithms that we present in this paper. We compare the results of six initialization procedures (two standard and four new) on our ALS algorithms. Lastly, we discuss the practical issue of choosing an appropriate convergence criterion.

A popular approach within the signal processing and machine learning communities consists in modelling signals as sparse linear combinations of atoms selected from a learned dictionary. While this paradigm has led to numerous empirical successes in various fields ranging from image to audio processing, there have only been a few theoretical arguments supporting these evidences. In particular, sparse coding, or sparse dictionary learning, relies on a non-convex procedure whose local minima have not been fully analyzed yet. In this paper, we consider a probabilistic model of sparse signals, and show that, with high probability, sparse coding admits a local minimum around the reference dictionary generating the signals. Our study takes into account the case of over-complete dictionaries, noisy signals, and possible outliers, thus extending previous work limited to noiseless settings and/or under-complete dictionaries. The analysis we conduct is non-asymptotic and makes it possible to understand how the key quantities of the problem, such as the coherence or the level of noise, can scale with respect to the dimension of the signals, the number of atoms, the sparsity and the number of observations.

Viral marketing is becoming important due to the popularity of online social networks (OSNs). Companies may provide incentives (e.g., via free samples of a product) to a small group of users in an OSN, and these users provide recommendations to their friends, which eventually increases the overall sales of a given product. Nevertheless, this also opens a door for "malicious behaviors": dishonest users may intentionally give misleading recommendations to their friends so as to distort the normal sales distribution. In this paper, we propose a detection framework to identify dishonest users in OSNs. In particular, we present a set of fully distributed and randomized algorithms, and also quantify the performance of the algorithms by deriving probability of false positive, probability of false negative, and the distribution of number of detection rounds. Extensive simulations are also carried out to illustrate the impact of misleading recommendations and the effectiveness of our detection algorithms. The methodology we present here will enhance the security level of viral marketing in OSNs.

In this paper, we explore a volume-based stable embedding of multi-dimensional signals based on Grassmann manifold, via Gaussian random measurement matrices. The Grassmann manifold is a topological space in which each point is a linear vector subspace, and is widely regarded as an ideal model for multi-dimensional signals. In this paper, we formulate the linear subspace spanned by multi-dimensional signal vectors as points on the Grassmann manifold, and use the volume and the product of sines of principal angles (also known as the product of principal sines) as the generalized norm and distance measure for the space of Grassmann manifold. We prove a volume-preserving embedding property for points on the Grassmann manifold via Gaussian random measurement matrices, i.e., the volumes of all parallelotopes from a finite set in Grassmann manifold are preserved upon compression. This volume-preserving embedding property is a multi-dimensional generalization of the conventional stable embedding properties, which only concern the approximate preservation of lengths of vectors in certain unions of subspaces. Additionally, we use the volume-preserving embedding property to explore the stable embedding effect on a generalized distance measure of Grassmann manifold induced from volume. It is proved that the generalized distance measure, i.e., the product of principal sines between different points on the Grassmann manifold, is well preserved in the compressed domain via Gaussian random measurement matrices.Numerical simulations are also provided for validation.

High-dimensional statistical tests often ignore correlations to gain simplicity and stability leading to null distributions that depend on functionals of correlation matrices such as their Frobenius norm and other $\ell_r$ norms. Motivated by the computation of critical values of such tests, we investigate the difficulty of estimation the functionals of sparse correlation matrices. Specifically, we show that simple plug-in procedures based on thresholded estimators of correlation matrices are sparsity-adaptive and minimax optimal over a large class of correlation matrices. Akin to previous results on functional estimation, the minimax rates exhibit an elbow phenomenon. Our results are further illustrated in simulated data as well as an empirical study of data arising in financial econometrics.

Gaussian comparison theorems are useful tools in probability theory; they are essential ingredients in the classical proofs of many results in empirical processes and extreme value theory. More recently, they have been used extensively in the analysis of underdetermined linear inverse problems. A prominent role in the study of those problems is played by Gordon's Gaussian min-max theorem. It has been observed that the use of the Gaussian min-max theorem produces results that are often tight. Motivated by recent work due to M. Stojnic, we argue explicitly that the theorem is tight under additional convexity assumptions. To illustrate the usefulness of the result we provide an application example from the field of noisy linear inverse problems.

The alternating direction method of multipliers (ADMM) is widely used in solving structured convex optimization problems. Despite of its success in practice, the convergence properties of the standard ADMM for minimizing the sum of $N$ $(N\geq 3)$ convex functions with $N$ block variables linked by linear constraints, have remained unclear for a very long time. In this paper, we present convergence and convergence rate results for the standard ADMM applied to solve $N$-block $(N\geq 3)$ convex minimization problem, under the condition that one of these functions is convex (not necessarily strongly convex) and the other $N-1$ functions are strongly convex. Specifically, in that case the ADMM is proven to converge with rate $O(1/t)$ in a certain ergodic sense, and $o(1/t)$ in non-ergodic sense, where $t$ denotes the number of iterations.

For systems and devices, such as cognitive radio and networks, that need to be aware of available frequency bands, spectrum sensing has an important role. A major challenge in this area is the requirement of a high sampling rate in the sensing of a wideband signal. In this paper a wideband spectrum sensing method is presented that utilizes a sub-Nyquist sampling scheme to bring substantial savings in terms of the sampling rate. The correlation matrix of a finite number of noisy samples is computed and used by a non-linear least square (NLLS) estimator to detect the occupied and vacant channels of the spectrum. We provide an expression for the detection threshold as a function of sampling parameters and noise power. Also, a sequential forward selection algorithm is presented to find the occupied channels with low complexity. The method can be applied to both correlated and uncorrelated wideband multichannel signals. A comparison with conventional energy detection using Nyquist-rate sampling shows that the proposed scheme can yield similar performance for SNR above 4 dB with a factor of 3 smaller sampling rate.

The sparse CT (Computed Tomography), inspired by compressed sensing, means to introduce a prior information of image sparsity into CT reconstruction to reduce the input projections so as to reduce the potential threat of incremental X-ray dose to patients’ health. Recently, many remarkable works were concentrated on the sparse CT reconstruction from sparse (limited-angle or few-view style) projections. In this paper we would like to incorporate more prior information into the sparse CT reconstruction for improvement of performance. It is known decades ago that the given projection directions can provide information about the directions of edges in the restored CT image. ATV (Anisotropic Total Variation), a TV (Total Variation) norm based regularization, could use the prior information of image sparsity and edge direction simultaneously. But ATV can only represent the edge information in few directions and lose much prior information of image edges in other directions.

Methods
To sufficiently use the prior information of edge directions, a novel MDATV (Multi-Direction Anisotropic Total Variation) is proposed. In this paper we introduce the 2D-IGS (Two Dimensional Image Gradient Space), and combined the coordinate rotation transform with 2D-IGS to represent edge information in multiple directions. Then by incorporating this multi-direction representation into ATV norm we get the MDATV regularization. To solve the optimization problem based on the MDATV regularization, a novel ART (algebraic reconstruction technique)+MDATV scheme is outlined. And NESTA (NESTerov’s Algorithm) is proposed to replace GD (Gradient Descent) for minimizing the TV-based regularization.
Results
The numerical and real data experiments demonstrate that MDATV based iterative reconstruction improved the quality of restored image. NESTA is more suitable than GD for minimization of TV-based regularization.
Conclusions
MDATV regularization can sufficiently use the prior information of image sparsity and edge information simultaneously. By incorporating more prior information, MDATV based approach could reconstruct the image more exactly.