Abstract:
A new method for named entity recognition in Chinese medical records based on cascaded Conditional Random Fields (CRFs) is proposed. The first layer of the cascaded CRFs is used to identify the basic named entities of body parts and diseases. Then, the identified results are fed to the second layer for recognition of nested named entities for complex diseases and clinical symptoms. A new combination feature, composed of part-of-speech features and named entity features, is defined. This new feature together with the character features, word boundary features and context features in a sentence are taken as the feature set of the second layer. In the experiments based on CRF++, the proposed method yields a 3% higher F-score than cascaded CRF without the combination feature. Moreover, compared to single layer CRF method, it yields a 7% higher F-score, a significant increase in overall performance.