The well-known C4.5 statistical classifier is a double hard algorithm. First of all, because data-miners simply would not like to spend time on a yet another brand new parallel version :-) Many past experiences demonstrated that tiny improvements of the sequential algorithm could bring much more performance than a robust investment on parallelization. This clearly does not absolutely mean that parallelization is useless, but, at least in our understanding,​ that a low-effort and conservative parallelization is the only fairly welcome parallelization in the data-mining community. Unfortunately that kind of parallelization,​ i.e. loop and recursion parallelization,​ is technically complex because independent tasks generated in this way may exhibit several non nice proprieties,​ including a huge range of variability in the task size that in turn may induce both severe synchronization overheads and non-trivial load balancing problems that limit the speedup.

+

The YaDT-FastFlow application faces both problems. [[http://​ieeexplore.ieee.org/​Xplore/​login.jsp?​url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F9460%2F30023%2F01374196.pdf|YaDT]] is a third-party,​ main-memory implementation of the C4.5-like decision tree algorithm by Salvatore Ruggieri. YaDT-FastFlow is a //​low-effort//​ parallelization of the sequential algorithm that required less than 10 hours of development (including tuning and testing) while producing a significant speedup over the sequential version.

+

+

This application aims at demonstrating the ability of FastFlow and FastFlow accelerator to support rapid and efficient development via semi-automatic parallelization of loops and Divide&​Conquer in third-party and legacy codes. ​

+

+

Stay tuned for a brand new Technical Report about that. The code will be publicly available with the Technical Report. The C.4.5-FastFlow application has been developed in cooperation with Salvatore Ruggieri, University of Pisa, Italy. ​

The well-known C4.5 statistical classifier is a double hard algorithm. First of all, because data-miners simply would not like to spend time on a yet another brand new parallel version :-) Many past experiences demonstrated that tiny improvements of the sequential algorithm could bring much more performance than a robust investment on parallelization. This clearly does not absolutely mean that parallelization is useless, but, at least in our understanding,​ that a low-effort and conservative parallelization is the only fairly welcome parallelization in the data-mining community. Unfortunately that kind of parallelization,​ i.e. loop and recursion parallelization,​ is technically complex because independent tasks generated in this way may exhibit several non nice proprieties,​ including a huge range of variability in the task size that in turn may induce both severe synchronization overheads and non-trivial load balancing problems that limit the speedup.

-

The YaDT-FastFlow application faces both problems. [[http://​ieeexplore.ieee.org/​Xplore/​login.jsp?​url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F9460%2F30023%2F01374196.pdf|YaDT]] is a third-party,​ main-memory implementation of the C4.5-like decision tree algorithm by Salvatore Ruggieri. YaDT-FastFlow is a //​low-effort//​ parallelization of the sequential algorithm that required less than 10 hours of development (including tuning and testing) while producing a significant speedup over the sequential version.

-

-

This application aims at demonstrating the ability of FastFlow and FastFlow accelerator to support rapid and efficient development via semi-automatic parallelization of loops and Divide&​Conquer in third-party and legacy codes. ​

-

-

Stay tuned for a brand new Technical Report about that. The code will be publicly available with the Technical Report. The C.4.5-FastFlow application has been developed in cooperation with Salvatore Ruggieri, University of Pisa, Italy. ​