The classification works on the corpus of all the parliamentary questions (oral and written) presented during the VIII term.
The corpus is projected in a vector space where the dimensions are the keywords selected through a technique that involves the use of Markov Chains.
In this space, every text is represented by the TF-IDF (term frequency–inverse document frequency) vector.

On this vector space we've trained two different classifiers (svm and random forest).
Combining the two classifiers we reach a precision of 81% on our test set.

As you may understand, classifying parliamentary texts involves knowledge of the domain, care when combining the classifiers and a high quality training.
Even when all these elements are there, this semi-automatic classification can hardly be perfect, but it's good to continously try to improve it.