The Data Mining Forum This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.

Hello data forums! I am interested in mining sequential patterns from a list of strings, with the additional criterion that the patterns found have to repeat a given "repSupport" number of times in a sequence for it to count. A way to approach this repetition criterion is given in this paper: http://www.utdallas.edu/~muratk/publications/dawak.pdf

However that paper reimplements an older, slower algorithm. I am looking to learn how a newer algorithm works and reimplement that with the repetition support criterion included. From what I have gathered, CM_SPADE is state-of-the-art for SPM. Would you recommend I try adjusting this algorithm? Or is there another algorithm that might be slower in general, but better suited to adjust to include repetition support? (It also might be influential that I'm not mining itemsets, but just strings?)

Thanks for your help. If you have any other advice, let me know.. I'm a little new to the field