Most natural language processing tasks can be seen as finding an optimal object from a finite set of objects. Often, the object of interest is structured, with combinatorial structures involving trees, matchings, and permutations. The combinatorial nature makes many natural language processing problems challenge to solve. On the other hand, submodularity, also known as the discrete analogous of convexity, makes many combinatorial optimization problems either tractable or approximable where otherwise neither would be possible. Whether submodularity is applicable to natural language processing problems, however, has never been studied before. In this thesis, we fill this gap by exploring submodularity in natural language processing. We show that submodularity is practically useful for many natural language processing tasks since, in addition to giving high-quality approximate solutions to the intractable problems, it also completely captures the essence of many practical situations arises in natural language processing tasks. To do so, we demonstrate the applicability of submodular function optimization to three natural language processing tasks: word alignment for machine translation, optimal corpus creation, and document summarization. In word alignment task, we show that submodularity naturally arises when modeling word fertility in word alignment tasks. We moreover cast word alignment problem as a submodular optimization problem over matroid constraints, which provides a brand new angle of viewing this problem and essentially generalizes conventional matching based approaches. In the task of optimal corpus creation, we first show that the state-of-art method corresponds to using greedy algorithm for supermodular maximization with cardinality constraint, which could perform arbitrarily poorly in theory. Alternatively, we express the problem as a minimization problem over a weighted sum of modular functions and submodular functions. We further study algorithms for general submodular function minimization, where we offer the first empirical study of the complexity of minimum-norm-point algorithm, which is widely accepted as the most practical algorithm for submodular minimization, in the scale of practical interest, and show that on a particular type of submodular functions that arises in practice, minimum-norm-point algorithm's empirical time complexity is as bad as that of the combinatorial algorithms for submodular function minimization. We moreover propose acceleration methods which speed up minimum-norm-point algorithm phenomenally in practise. For the document summarization task, we reveal that many well-established approaches, as well as the evaluation methods, are all correspond to submodular function optimization, giving strong evidences on the fact that submodularity is a natural fit for summarization tasks. The document summarization task, therefore, can naturally be casted as budget submodular maximization problem. We propose efficient algorithm for this problem that scales well to the application, and theoretically show that the algorithm is guaranteed to find near-optimal solutions. We further introduce a class of submodular functions that is not only monotone but also models relevance and diversity simultaneously for document summarization. This class of submodular functions is then generalized to a mixture of submodular components, where each component either models the relevance or models the diversity, and differs either in function forms or in function parameters. The learning problem of submodular mixtures is also addressed, in which we show the risk of approximate learning is bounded by the risk of exact learning where exact inference is used. When evaluated on the standard benchmark task for document summarization, namely Document Understanding Conference (DUC), we achieve best results ever reported on DUC-2004, DUC-2005, DUC-2006, and DUC-2007.