Today witnesses an explosion of data coming from various types of networks such as online social networks and
biological networks. The goal of this thesis is to understand when and how we can efficiently extract useful information from such network data.
In the first part, we are interested in finding tight-knit communities within a network.
Assuming the network is generated according to a planted cluster model, we derive a computationally efficient semidefinite programming relaxation of the maximum likelihood estimation method
and obtain a stronger performance guarantee than previously known.
If the community sizes are linear in the total number of vertices, the guarantee matches up to a constant factor with the information limit which we also identify, and exactly matches without a constant gap when there is a single community or two equal-sized communities. However, if the community sizes are sublinear in the total number of vertices,
the guarantee is far from the information limit. We conjecture that our algorithm achieves the computational limit below which no
polynomial-time algorithm can succeed. To provide evidence, we show that finding a community
in some regime below the conjectured computational limit but above the information limit is computationally intractable,
assuming hardness of the well-known planted clique problem.
The second part studies the problem of inferring the group preference for a set of items
based on the partial rankings over different subsets of the items provided by a group of users. A question of particular interest is how to optimally construct the graph used for assigning items to users for ranking. Assuming the partial rankings are generated independently according to the Plackett-Luce model, we analyze
computationally efficient estimators based on maximum likelihood and rank-breaking schemes that decompose partial rankings into pairwise comparisons. We provide upper and lower bounds on the estimation error. The lower bound depends on the degree sequence of the assignment graph, while the upper bound depends on the spectral gap of the assignment graph. When the graph is an expander, the lower and upper bounds match up to a logarithmic factor.
The unifying theme for the two parts of the thesis is the spectral gap of the graph. In both cases, when the graph has a large spectral gap, accurate and efficient inference is possible via maximum likelihood estimation or its convex relaxation. However, when the spectral gap vanishes, accurate inference may be statistically
impossible, or it is statistically possible but may be computationally intractable.