The matrix completion problem consists of finding or approximating a low-rank matrix based on a few samples of this matrix. We propose a novel algorithm for matrix completion that minimizes the least square distance on the sampling set over the Riemannian manifold of fi xed-rank matrices. The algorithm is an adaptation of classical non-linear conjugate gradients, developed within the framework of retraction-based optimization on manifolds. We describe all the necessary objects from di erential geometry necessary to perform optimization over this low-rank matrix manifold, seen as a submanifold embedded in the space of matrices. In particular, we describe how metric projection can be used as retraction and how vector transport lets us obtain the conjugate search directions. Additionally, we derive second-order models that can be used in Newton's method based on approximating the exponential map on this manifold to second order. Finally, we prove convergence of a regularized version of our algorithm under the assumption that the restricted isometry property holds for incoherent matrices throughout the iterations. The numerical experiments indicate that our approach scales very well for large-scale problems and compares favorable with the state-of-the-art, while outperforming most existing solvers.

Following up on yesterday's rant, let me give some perspective as to what compressive sensing brings to the table through a new chosen crop of papers from Arxiv in the past two weeks.

Clearly the next two studies fall into the new algorithmic tools section with the second one making an inference that was not done before by specialists in the field. in other words, CS provides a new insight in an older problem that was not recognized by a dedicated community.

We consider a novel group testing procedure, termed semi-quantitative group testing, motivated by a class of problems arising in genome sequence processing. Semi-quantitative group testing (SQGT) is a non-binary pooling scheme that may be viewed as a combination of an adder model followed by a quantizer. For the new testing scheme we define the capacity and evaluate the capacity for some special choices of parameters using information theoretic methods. We also define a new class of disjunct codes suitable for SQGT, termed SQ-disjunct codes. We also provide both explicit and probabilistic code construction methods for SQGT with simple decoding algorithms.

We propose a novel framework for studying causal inference of gene interactions using a combination of compressive sensing and Granger causality techniques. The gist of the approach is to discover sparse linear dependencies between time series of gene expressions via a Granger-type elimination method. The method is tested on the Gardner dataset for the SOS network in E. coli, for which both known and unknown causal relationships are discovered.

This next paper describes a new imaging system which is clearly at low technology readiness level. It has some potential.

An imaging system based on single photon counting and compressive sensing (ISSPCCS) is developed to reconstruct a sparse image in absolute darkness. The single photon avalanche detector and spatial light modulator (SLM) of aluminum micro-mirrors are employed in the imaging system while the convex optimization is used in the reconstruction algorithm. The image of an object in the very dark light can be reconstructed from an under-sampling data set, but with very high SNR and robustness. Compared with the traditional single-pixel camera used a photomultiplier tube (PMT) as the detector, the ISSPCCS realizes photon counting imaging, and the count of photons not only carries fluctuations of light intensity, but also is more intuitive.

The following paper shows a change in the software used in the data processing chain taking place after data has been acquired. There is no change in hardware here even though it might ultimately lead to one.

Emerging sonography techniques often require increasing the number of transducer elements involved in the imaging process. Consequently, larger amounts of data must be acquired and processed. The significant growth in the amounts of data affects both machinery size and power consumption. Within the classical sampling framework, state of the art systems reduce processing rates by exploiting the bandpass bandwidth of the detected signals. It has been recently shown, that a much more significant sample-rate reduction may be obtained, by treating ultrasound signals within the Finite Rate of Innovation framework. These ideas follow the spirit of Xampling, which combines classic methods from sampling theory with recent developments in Compressed Sensing. Applying such low-rate sampling schemes to individual transducer elements, which detect energy reflected from biological tissues, is limited by the noisy nature of the signals. This often results in erroneous parameter extraction, bringing forward the need to enhance the SNR of the low-rate samples. In our work, we achieve SNR enhancement, by beamforming the sub-Nyquist samples obtained from multiple elements. We refer to this process as "compressed beamforming". Applying it to cardiac ultrasound data, we successfully image macroscopic perturbations, while achieving a nearly eight-fold reduction in sample-rate, compared to standard techniques.

The next paper falls under the new conceptual study for a new architecture:

Smart Grids measure energy usage in real-time and tailor supply and delivery accordingly, in order to improve power transmission and distribution. For the grids to operate effectively, it is critical to collect readings from massively-installed smart meters to control centers in an efficient and secure manner. In this paper, we propose a secure compressed reading scheme to address this critical issue. We observe that our collected real-world meter data express strong temporal correlations, indicating they are sparse in certain domains. We adopt Compressed Sensing technique to exploit this sparsity and design an efficient meter data transmission scheme. Our scheme achieves substantial efficiency offered by compressed sensing, without the need to know beforehand in which domain the meter data are sparse. This is in contrast to traditional compressed-sensing based scheme where such sparse-domain information is required a priori. We then design specific dependable scheme to work with our compressed sensing based data transmission scheme to make our meter reading reliable and secure. We provide performance guarantee for the correctness, efficiency, and security of our proposed scheme. Through analysis and simulations, we demonstrate the effectiveness of our schemes and compare their performance to prior arts.

Finally, the last paper shows an improvement in the reconstruction rather than in the actual hardware.

Purpose: To retrospectively evaluate the fidelity of magnetic resonance (MR) spectroscopic imaging data preservation at a range of accelerations by using compressed sensing. Materials and Methods: The protocols were approved by the institutional review board of the university, and written informed consent to acquire and analyze MR spectroscopic imaging data was obtained from the subjects prior to the acquisitions. This study was HIPAA compliant. Retrospective application of compressed sensing was performed on 10 clinical MR spectroscopic imaging data sets, yielding 600 voxels from six normal brain data sets, 163 voxels from two brain tumor data sets, and 36 voxels from two prostate cancer data sets for analysis. The reconstructions were performed at acceleration factors of two, three, four, five, and 10 and were evaluated by using the root mean square error (RMSE) metric, metabolite maps (choline, creatine, N-acetylaspartate [NAA], and/or citrate), and statistical analysis involving a voxelwise paired t test and one-way analysis of variance for metabolite maps and ratios for comparison of the accelerated reconstruction with the original case. Results: The reconstructions showed high fidelity for accelerations up to 10 as determined by the low RMSE (, 0.05). Similar means of the metabolite intensities and hot-spot localization on metabolite maps were observed up to a factor of five, with lack of statistically significant differences compared with the original data. The metabolite ratios of choline to NAA and choline plus creatine to citrate did not show significant differences from the original data for up to an acceleration factor of five in all cases and up to that of 10 for some cases. Conclusion: A reduction of acquisition time by up to 80%, with negligible loss of information as evaluated with clinically relevant metrics, has been successfully demonstrated for hydrogen 1 MR spectroscopic imaging.

In all what do we see ? A compressive sensing approach currently either:

provide a means of changing the current computational chain yielding gains at operational level [1, 4,6] (high TRL) and even permit discovery [2]! At this stage there is no change in hardware but it is likely the first step before new hardware gets to be complemented in view of the new data chain pipeline [4].

In short, at this stage, either the changes possible due to compressive sensing are invisible or very few (only one new sensor [3]) at a very low technology maturity level.

The scale on the right points to the maturity of a certain technology before it can be used in some operational fashion specifically in industrial applications. When it comes to compressive sensing related sensors, we are currently in the dip for most of them.

The only technology (MRI) that made the leap did not require a change in hardware.

It is as simple as that. Other ways you will hear about compressed sensing is when it improves not hardware but processes such as in group testing and/or some sort of competitive encoding. This is why you keep on hearing about those but less about the more fundamental new types of sensors.. In effect all the others are so low on the Technology Readiness Level list that it will take some investment from some niche application to grow them up to ubiquity.

Hyperspectral imagery is really one of those technologies we do not have. By that I mean that at a price of $100,000, no tinkerer will ever gets their hands on one of these cameras and as I explained the Dorkbot crowd two weeks ago, if no tinkerer can get their hands on it, then serendipity is unlikely to happen. The technology currently addresses broad needs but what we really need for this technology to be mainstream is the accidental discovery. It sure won't come if it cost 100,000 buckarus. David Brady and others are working on using compressive sensing to drown that cost by one or two orders of magnitude. Some of these advances will not just come from hardware alone. Hence, I was ecstatic when I received this email last week from Pierre Vandergheynst:

Hi IgorI'd like to draw your attention on the following paper: http://infoscience.epfl.ch/record/174926that advocates joint trace (or nuclear) - TV norm minimization for hyper spectral images, with really nice performances (see the effect of reconstructing with only m = 0.03 * n on Fig 2).My student Mohammad is wrapping up the matlab code to be distributed on the same page. The data used in the paper is also publicly available.Best regards

In this paper we propose a novel and efficient model for compressed sensing of hyperspectral images. A large-size hyperspectral image can be subsampled by retaining only 3% of its original size, yet robustly recovered using the new approach we present here. Our reconstruction approach is based on minimizing a convex functional which penalizes both the trace norm and the TV norm of the data matrix. Thus, the solution tends to have a simultaneous low-rank and piecewise smooth structure: the two important priors explaining the underlying correlation structure of such data. Through simulations we will show our approach significantly enhances the conventional compression rate-distortion tradeoffs. In particular, in the strong undersampling regimes our method outperforms the standard TV denoising image recovery scheme by more than 17dB in the reconstruction MSE.

I wonder how this technique can used for blind deconvolution. But then eventually one can but wonder how this is going to affect actual compression hardware on satellites [1,2]

Saturday, February 25, 2012

Igor,I hope things are well. We just opened a new position for a member of research staff at MERLwith focus on compressive sensing. I would appreciate it if you could post it on Nuit Blanche. The link is here and the posting follows: http://www.merl.com/employment/employment.php#MM2Thanks!PetrosMember of Research StaffMultimedia group

MERL is seeking qualified and high-caliber applicants for a Research Staff position in the Multimedia Group. The successful candidate will be expected to perform original research in the area of signal acquisition and sensing, with emphasis in compressive sensing and array processing. Further details are outlined below.

Responsibilities: • Conduct innovative research on sensing and signal acquisition emphasizing modern acquisition methods and compressive sensing, with applications in ultrasonic, radio, optical, or other sensing modalities. • Prepare patent disclosures on newly developed work and assist legal department in finalizing patent applications. • Transfer developed technology to business units through R&D labs in Japan including software, documentation and experimental results. • Propose new research directions in sensing and signal acquisition technology, and contribute to overall research agenda of the group. • Publish results in major international conferences and peer-reviewed journals, and maintain a presence in the academic community.Qualifications: • Applicants are expected to hold a Ph.D. degree in Electrical Engineering, Computer Science, or closely related field. • Applicants with 2+ years of post-Ph.D. research experience are preferred. Exceptional candidates including recent Ph.D. graduates and post-doctoral researchers are strongly encouraged to apply. • Extensive signal processing background and understanding of sampling, sensing and signal acquisition technologies is required. Direct experience with compressive sensing is also required. • Experience with array processing and/or related acquisition systems is preferred. • Familiarity with related technologies such as dictionary learning, convex optimization, or greedy algorithms is also preferred. • Applicants must have a strong publication record demonstrating novel research achievements. • Excellent programming skills (C/C++, Matlab, etc) are required. • Excellent presentation, written and inter-personal communication skills are expected.Mitsubishi Electric Research Laboratories, Inc. is an Equal Opportunity Employer.PRINCIPALS ONLY apply via email to avetro@merl.com. No phone calls please.

Laboratory for Information and Inference Systems (http://lions.epfl.ch/) at Ecole Polytechnique Federale de Lausanne (EPFL) has multiple openings for energetic postdoctoral fellows in the broad fields of Applied Mathematics, Theoretical Computer Science, and Statistics as part of Prof. Volkan Cevher’s 5-year European Research Council project. Salaries start at CHF 84K. Positions are available immediately.

Our lab provides a fun, collaborative research environment with state-of-the-art facilities at EPFL, one of the leading technical universities worldwide. EPFL is located in Lausanne next to Lake Geneva in a scenic setting.

Topics of interest include, but are not limited to

·Analysis and design of advanced convex and non-convex optimization methods with applications to inverse problems.

·Advanced linear algebra methods for optimized information extraction from very large dimensional datasets.

·Analysis and applications of non-iid probabilistic models.

Requirements:

Applicants are expected to have finished, or are about to finish their Ph.D. degrees. They must have an exceptional background at least in one of the following topics: numerical convex optimization, combinatorial optimization, statistics, coding theory, and algorithms.

A track record of relevant publications at top applied mathematics, theoretical computer science, statistics, or engineering journals or conferences is essential.

Two postdoc positions are open in the Metiss team (Speech and Audio Processing) at Inria, Rennes, France. The postdocs are funded by PLEASE (Projection, Learning and Sparsity for Efficient data processing), a project sponsored by the ERC (European Research Council), which aims at investigating sparse representations and low-dimensional projections. The research conducted by the postdocs will be at the frontiers of Signal Processing and Machine Learning, with applications to Acoustics & Audio.

Subject

The Metiss team develops mathematical and statistical signal models and algorithms for acoustic and audio applications. In the framework of the PLEASE project, these problems are addressed under the auspices of

sparsity and low-dimensional projections, with the aim of developing new ways to acquire, analyze and process the information content of complex acoustic data (e.g.: compressive acoustic sensing, blind source

separation), as well as large collections of such data for learning (e.g.: multimedia indexing).

A first goal will be to further develop recently achieved mathematical and algorithmic results, to demonstrate their impact on selected acoustic applications, and to disseminate them through the development of software. A second aspect will be to explore machine learning strategies exploiting low-dimensional projections to process large data collections in the context of multimedia indexing.

Environment

The Metiss team gathers around 15 researchers, post-docs, PhD students and engineers with expertise in various fields of mathematical and statistical signal processing and audio. The team is part of the IRISA /

INRIA Rennes - Bretagne Atlantique Research Center, located on the campus of the Université de Rennes I in the historic city of Rennes, capital of Britanny. The center is a major player in computer science and

have occasions to interact with several groups at Inria Rennes, including computer scientists and applied mathematicians, and will have access to large-scale computing grids and multimedia databases, as well as

multichannel audio recording equipment.

Qualifications required

Candidates should hold a Ph.D., and will either be applied mathematicians with interest for statistical signal processing and acoustic applications, and good programming skills, or originate from signal processing / computer science with solid background in applied mathematics and statistics.

Previous experience in sparse signal representations or statistical machine learning is preferred, but experience in related areas is suitable.

Applicants are requested to send a CV, a list of publications and a brief statement of research interests. This material, together with two letters of reference, shall be sent to Stephanie.lemaile@inria,fr before April 15, 2012.

My talk will be a tutorial about sparse signal recovery but, more importantly, I will provide an overview of what the research problems are at the intersection of biological applications of group testing, streaming algorithms, sparse signal recovery, and coding theory. The talk should help set the stage for the rest of the workshop.

In the past few years, we have experienced a paradigm shift in human genetics. Accumulating lines of evidence have highlighted the pivotal role of rare genetic variations in a wide variety of traits and diseases. Studying rare variations is a needle in a haystack problem, as large cohorts have to be assayed in order to trap the variations and gain statistical power. The performance of DNA sequencing is exponentially growing, providing sufficient capacity to profile an extensive number of specimens. However, sample preparation schemes do not scale as sequencing capacity. A brute force approach of preparing hundredths to thousands of specimens for sequencing is cumbersome and cost-prohibited. The next challenge, therefore, is to develop a scalable technique that circumvents the bottleneck in sample preparation.

My tutorial will provide background on rare genetic variations and DNA sequencing. I will present our sample prep strategy, called DNA Sudoku, that utilizes combinatorial pooling/compressed sensing approach to find rare genetic variations. More importantly, I will discuss several major distinction from the classical combinatorial due to sequencing specific constraints.

Identification of rare variants by resequencing is important both for detecting novel variations and for screening individuals for known disease alleles. New technologies enable low-cost resequencing of target regions, although it is still prohibitive to test more than a few individuals. We propose a novel pooling design that enables the recovery of novel or known rare alleles and their carriers in groups of individuals. The method is based on combining next-generation sequencing technology with a Compressed Sensing (CS) approach. The approach is general, simple and efficient, allowing for simultaneous identification of multiple variants and their carriers. It reduces experimental costs, i.e., both sample preparation related costs and direct sequencing costs, by up to 70 fold, and thus allowing to scan much larger cohorts. We demonstrate the performance of our approach over several publicly available data sets, including the 1000 Genomes Pilot 3 study. We believe our approach may significantly improve cost effectiveness of future association studies, and in screening large DNA cohorts for specific risk alleles.

We will present initial results of two projects that were initiated following publication. The first project concerns identification of de novo SNPs in genetic disorders common among Ashkenazi Jews, based on sequencing 3000 DNA samples. The second project in plant genetics involves identifying SNPs related to water and silica homeostasis in Sorghum bicolor, based on sequencing 3000 DNA samples using 1-2 Illumina lanes.

Joint work with Amnon Amir from the Weizmann Institute of Science, and Or Zuk from the Broad Institute of MIT and Harvard

New regulatory roles continue to emerge for both natural and engineered noncoding RNAs, many of which have specific secondary and tertiary structures essential to their function. This highlights a growing need to develop technologies that enable rapid and accurate characterization of structural features within complex RNA populations. Yet, available structure characterization techniques that are reliable are also vastly limited by technological constraints, while the accuracy of popular computational methods is generally poor. These limitations thus pose a major barrier to the comprehensive determination of structure from sequence and thereby to the development of mechanistic understanding of transcriptome dynamics. To address this need, we have recently developed a high-throughput structure characterization technique, called SHAPE-Seq, which simultaneously measures quantitative, single nucleotide-resolution, secondary and tertiary structural information for hundreds of RNA molecules of arbitrary sequence. SHAPE-Seq combines selective 2’-hydroxyl acylation analyzed by primer extension (SHAPE) chemical mapping with multiplexed paired-end deep sequencing of primer extension products. This generates millions of sequencing reads, which are then analyzed using a fully automated data analysis pipeline. Previous bioinformatics methods, in contrast, are laborious, heuristic, and expert-based, and thus prohibit high-throughput chemical mapping.

In this talk, I will review recent developments in experimental RNA structure characterization as well as advances in sequencing technologies. I will then describe the SHAPE-Seq technique, focusing on its automated data analysis method, which relies on a novel probabilistic model of a SHAPE-Seq experiment, adjoined by a rigorous maximum likelihood estimation framework. I will demonstrate the accuracy and simplicity of our approach as well as its applicability to a general class of chemical mapping techniques and to more traditional SHAPE experiments that use capillary electrophoresis to identify and quantify primer extension products.

High-dimensional tensors or multi-way data are becoming prevalent in areas such as biomedical imaging, chemometrics, networking and bibliometrics. Traditional approaches to finding lower dimensional representations of tensor data include flattening the data and applying matrix factorizations such as principal components analysis (PCA) or employing tensor decompositions such as the CANDECOMP / PARAFAC (CP) and Tucker decompositions. The former can lose important structure in the data, while the latter Higher-Order PCA (HOPCA) methods can be problematic in high-dimensions with many irrelevant features. We introduce frameworks for sparse tensor factorizations or Sparse HOPCA based on heuristic algorithmic approaches and by solving penalized optimization problems related to the CP decomposition. Extensions of these approaches lead to methods for general regularized tensor factorizations, multi-way Functional HOPCA and generalizations of HOPCA for structured data. We illustrate the utility of our methods for dimension reduction, feature selection, and signal recovery on simulated data and multi-dimensional microarrays and functional MRIs.

We consider the problem of estimating a rank-one matrix in Gaussian noise under a probabilistic model for the left and right factors of the matrix. The probabilistic model can impose constraints on the factors including sparsity and positivity that arise commonly in learning problems. We propose a simple iterative procedure that reduces the problem to a sequence of scalar estimation computations. The method is similar to approximate message passing techniques based on Gaussian approximations of loopy belief propagation that have been used recently in compressed sensing. Leveraging analysis methods by Bayati and Montanari, we show that the asymptotic behavior of the estimates from the proposed iterative procedure is described by a simple scalar equivalent model, where the distribution of the estimates is identical to certain scalar estimates of the variables in Gaussian noise. Moreover, the effective Gaussian noise level is described by a set of state evolution equations. The proposed method thus provides a computationally simple and general method for rank-one estimation problems with a precise analysis in certain high-dimensional settings.

In this paper, we propose a study of performance of the channel estimation using LS, MMSE, LMMSE and Lr-LMMSE algorithms in OFDM (Orthogonal Frequency Division Multiplexing) system which, as known suffers from the time variation of the channel under high mobility conditions, using block pilot insertion. The loss of sub channel orthogonality leads to inter-carrier interference (ICI). Using many algorithms for channel estimation, we will show that, for a 16- QAM modulation, the LMMSE algorithm performs well to achieve this estimation but when the SNR (Signal Noise Rate) is high, the four algorithms (LS, MMSE, LMMSE and Lr-LMMSE) perform similarly, this is not always the case for another scheme of modulation. We will improve also the mean squared error for these algorithms. It will be illustrious in this paper that the LMMSE algorithm performs well with the block- pilot insertion as well as its low rank version which behave very good even when the size of FFT is very high.

We investigate the problem of signal transduction via a descriptive analysis of the spatial organization of the complement of proteins exerting a certain function within a cellular compartment. We propose a scheme to assign a numerical value to individual proteins in a protein interaction network by means of a simple optimization algorithm. We test our procedure against datasets focusing on the proteomes in the neurite and soma compartments.

The ADI iteration is closely related to the rational Krylov projection methods for constructing low rank approximations to the solution of Sylvester equation. In this paper we show that the ADI and rational Krylov approximations are in fact equivalent when a special choice of shifts are employed in both methods. We will call these shifts pseudo H2-optimal shifts. These shifts are also optimal in the sense that for the Lyapunov equation, they yield a residual which is orthogonal to the rational Krylov projection subspace. Via several examples, we show that the pseudo H2-optimal shifts consistently yield nearly optimal low rank approximations to the solutions of the Lyapunov equations.

This paper studies the models of minimizing $||x||_1+1/(2\alpha)||x||_2^2$ where $x$ is a vector, as well as those of minimizing $||X||_*+1/(2\alpha)||X||_F^2$ where $X$ is a matrix and $||X||_*$ and $||X||_F$ are the nuclear and Frobenius norms of $X$, respectively. We show that they can efficiently recover sparse vectors and low-rank matrices. In particular, they enjoy exact and stable recovery guarantees similar to those known for minimizing $||x||_1$ and $||X||_*$ under the conditions on the sensing operator such as its null-space property, restricted isometry property, spherical section property, or RIPless property. To recover a (nearly) sparse vector $x^0$, minimizing $||x||_1+1/(2\alpha)||x||_2^2$ returns (nearly) the same solution as minimizing $||x||_1$ almost whenever $\alpha\ge 10||x^0||_\infty$. The same relation also holds between minimizing $||X||_*+1/(2\alpha)||X||_F^2$ and minimizing $||X||_*$ for recovering a (nearly) low-rank matrix $X^0$, if $\alpha\ge 10||X^0||_2$. Furthermore, we show that the linearized Bregman algorithm for minimizing $||x||_1+1/(2\alpha)||x||_2^2$ subject to $Ax=b$ enjoys global linear convergence as long as a nonzero solution exists, and we give an explicit rate of convergence. The convergence property does not require a solution solution or any properties on $A$. To our knowledge, this is the best known global convergence result for first-order sparse optimization algorithms.

In this paper, we study the problem of high-dimensional approximately low-rank covariance matrix estimation with missing observations. We propose a simple procedure computationally tractable in high-dimension and that does not require imputation of the missing data. We establish non-asymptotic sparsity oracle inequalities for the estimation of the covariance matrix with the Frobenius and spectral norms, valid for any setting of the sample size and the dimension of the observations. We further establish minimax lower bounds showing that our rates are minimax optimal up to a logarithmic factor.

This paper considers the problem of completing a matrix with many missing entries under the assumption that the columns of the matrix belong to a union of multiple low-rank subspaces. This generalizes the standard low-rank matrix completion problem to situations in which the matrix rank can be quite high or even full rank. Since the columns belong to a union of subspaces, this problem may also be viewed as a missing-data version of the subspace clustering problem. Let X be an n x N matrix whose (complete) columns lie in a union of at most k subspaces, each of rank <= r < n, and assume N >> kn. The main result of the paper shows that under mild assumptions each column of X can be perfectly recovered with high probability from an incomplete version so long as at least CrNlog^2(n) entries of X are observed uniformly at random, with C>1 a constant depending on the usual incoherence conditions, the geometrical arrangement of subspaces, and the distribution of columns over the subspaces. The result is illustrated with numerical experiments and an application to Internet distance matrix completion and topology identification.

The knowledge of end-to-end network distances is essential to many Internet applications. As active probing of all pairwise distances is infeasible in large-scale networks, a natural idea is to measure a few pairs and to predict the other ones without actually measuring them. This paper formulates the distance prediction problem as matrix completion where unknown entries of an incomplete matrix of pairwise distances are to be predicted. The problem is solvable because strong correlations among network distances exist and cause the constructed distance matrix to be low rank. The new formulation circumvents the well-known drawbacks of existing approaches based on Euclidean embedding.

A new algorithm, so-called Decentralized Matrix Factorization by Stochastic Gradient Descent (DMFSGD), is proposed to solve the network distance prediction problem. By letting network nodes exchange messages with each other, the algorithm is fully decentralized and only requires each node to collect and to process local measurements, with neither explicit matrix constructions nor special nodes such as landmarks and central servers. In addition, we compared comprehensively matrix factorization and Euclidean embedding to demonstrate the suitability of the former on network distance prediction. We further studied the incorporation of a robust loss function and of non-negativity constraints. Extensive experiments on various publicly-available datasets of network delays show not only the scalability and the accuracy of our approach but also its usability in real Internet applications.