Bulgarian Frequency Dictionaries

BulFreq

ID:

812

Bulgarian Frequency Dictionaries are lemma frequency dicitionaries extracted from the Bulgarian National Corpus (BulNC) which is annotated at various linguistic levels - sentence segmentation, POS tagging, lemmatisation, etc. BulNC contains 6 domain-specific subcorpora and thus 6 domain-specific Freq dictionaries were developed independently, as well as a general dictionary which combines all domain-specific ones. Each dictionary is available in 2 versions: in alphabetical order and in frequency order.Frequencies are automatically collected; more efficient methods for compilation of frequency lists and dictionaries are still investigated. The compilation of a frequency dictionary is performed in stages – compilation of the dictionary on smaller parts of the corpus, followed by merging.