Seminár Ústavu informatiky

Prednáška

Anomaly Searching in Text Sequences

Abstrakt

An analysis of some text if authors are not known is still an interesting problem and it could be done using methods of data analysis and data mining, and using structural analysis. In the paper, it is presented a system of modified Self-Organizing Maps working on probabilistic sequences built from a text. The sequences were built on letters and on words as n-grams, 1<n< 5. The system is trained to input sequences and after the training it determines text parts with anomalies using a cumulative error and a complex analysis. In tested texts the system was successful, it covered a composition of texts.