Saturday, November 22, 2014

At the last Machine Learning meetup in Paris, someone (I believe it was Vincent Guigue ) made the comment that "not" was a relatively powerful feature for detecting sentiment analysis in a written paragraph. Some of us wondered if the same was really true in French (it does not look like it is). Clearly it opens another line of thought, if you have a model that works well in English, because it has been able to put the right importance on the "not" feature, will that model be as efficient in sentiment analysis in another language (say French) ? [2] . Complexity can sometimes be hidden in the model used on top of the underlying complexity of the problem at hand. Which brings us to a more general comment on complexity, models and all that.

Our brain seems to act as a computing device that can handle small NP complete problems while it to seems to devise greedy strategies for larger ones. Small NP-complete problems probably end up being addictive because we store all the combinations filling up our natural RAM. So much so that we loose our ability to do something else: An eery parallel to the definition of addictiveness. When trying to solve bigger sized problems, we generally go about greedy like techniques and find out that they more or less work... or not.

I was thinking of this this past week which also happened to be the month we learned of Grothendieck's passing.

For instance, this week, we noted that vision and language could be seen as problems that could be formulated as NP-hard advanced matrix factorizations. We also know that within these problems, there are areas of the phase space where any greedy solvers could do well because one can state this problem in some relaxed version. Here is my open question, could we know more about ourselves thanks to our capabilities to do NP-hard problems and devise specific greedy solving ?

We present a simple and flexible method to prove consistency of semidefinite optimization problems on random graphs. The method is based on Grothendieck's inequality. Unlike the previous uses of this inequality that lead to constant relative accuracy, we achieve arbitrary relative accuracy by leveraging randomness. We illustrate the method with the problem of community detection in sparse networks. Despite much progress in the recent years, almost no rigorous results have been known for totally sparse networks -- those with bounded average degrees. We demonstrate that even in this regime, various natural semidefinite programs can be used to recover the community structure up to an arbitrarily small fraction of misclassified vertices. The method is general; it can be applied to a variety of stochastic models of networks and semidefinite programs.

[2] As an observation, at the very least, any country that spends real money in research and for which English is not the national tongue ought to invest in language centric databases freely available to all researchers.