The Data Mining Forum This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.

I'm testing the CPT+ but I can't understand how to interpret the scoring.
Is it already normalized?
What's the min and the max values for the scoring?
Can I already interpret it as a probability or it must be normalized?

no, the scores are not normalized by default. If you want to normalize them by yourself, you could do it in the CountTable::getBestSequence() method:

//Filling a sequence with the best |count| items
Sequence seq = new Sequence(-1);
sd.normalize();// Implement this method in the ScoreDistribution class
List<Integer> bestItems = sd.getBest(1.002);

However the scores do not represent real proportions, because of the multiplication of the individual subscores in the CountTable::push() method.
You would have to rewrite the score system if you are interested in real proportional probabilities.

Disclaimer: I am just a student who worked with this algorithm for half a year, so I can not guarantee correctness

Yes, the scores are not normalized in CPT+. The score for a prediction is the sum of its score for all the sequences that are used to make that prediction. Thus, the sum can be greater than 1. Beides, it cannot be negative.

Yes, the scoring system could be replaced by something else. When designing CPT/CPT+, my student Ted actually tried different scoring systems, and the one provided in CPT+ is the one that we found to work the best on our datasets. But maybe that other scoring systems are better or have other advantages. We found that it was more simple to have some scores that are not normalized.