Learning Bayesian Belief Networks (BBN) from corpora and incorporating the extracted inferring knowledge with a Support Vector Machines (SVM) classifier has been applied to the automatic acquisition of verb subcategorization frames for Modern Greek. We have made use of minimal linguistic resources, such as basic morphological tagging and phrase chunking, to demonstrate that verb subcategorization, which is of great significance for
developing robust natural language human computer interaction systems, could be
achieved using large corpora, without having any general-purpose syntactic parser at all. Moreover, by taking advantage of the plethora in unlabeled data found in text corpora in addition to some available labeled examples, we overcome the expensive task of
annotating the whole set of training data and the performance of the subcategorization
frames learner is increased. We argue that a classifier generated from BBN and SVM is well suited for learning to identify verb subcategorization frames. Empirical results will support this claim. Performance has been methodically evaluated using two different
corpora, one balanced and one domain-specific in order to determine the unbiased
behavior of the trained models. Limited training data are proved to endow with
satisfactory results. We have been able to achieve precision exceeding 90% on the
identification of subcategorization frames which were not known beforehand. The
obtained valid frames have been used to fill out the subcategorization field of verb entries
in an HPSG-like lexicon using the LKB grammar development environment.