This subject is proposed as part of the [[http://​rockflows.i3s.unice.fr/​|ROCKFlows]] project ​involving the following researchers:​ [[http://​mireilleblayfornarino.i3s.unice.fr|Mireille Blay-Fornarino]],​ [[http://​www.i3s.unice.fr/​~mosser/​start|Sébastien Mosser]] and [[http://​www.i3s.unice.fr/​~precioso/​|Frédéric Precioso]].

+

This subject is proposed as part of the [[http://​rockflows.i3s.unice.fr/​|ROCKFlows]] project.

===== Context =====

===== Context =====

For many years, Machine Learning research has been focusing on designing new algorithms for solving similar kinds of problem instances (Kotthoff, 2016). However, Researchers have long ago recognized that a single algorithm will not give the best performance across all problem instances, e.g. the No-Free-Lunch-Theorem (Wolpert, 1996) states that the best classifier will not be the same on every dataset. Consequently,​ the “winner-take-all” approach should not lead to neglect some algorithms that, while uncompetitive on average, may offer excellent performances on particular problem instances. In 1976, Rice characterized this as the "​algorithm selection problem"​ (Rice, 1976). ​

For many years, Machine Learning research has been focusing on designing new algorithms for solving similar kinds of problem instances (Kotthoff, 2016). However, Researchers have long ago recognized that a single algorithm will not give the best performance across all problem instances, e.g. the No-Free-Lunch-Theorem (Wolpert, 1996) states that the best classifier will not be the same on every dataset. Consequently,​ the “winner-take-all” approach should not lead to neglect some algorithms that, while uncompetitive on average, may offer excellent performances on particular problem instances. In 1976, Rice characterized this as the "​algorithm selection problem"​ (Rice, 1976). ​

Line 11:

Line 11:

* The structural characteristics (size, quality, and nature) of the collected data

* The structural characteristics (size, quality, and nature) of the collected data

* How the results will be used.

* How the results will be used.

-

This task is highly complex because of the increasing number of available algorithms, the difficulty in choosing the correct preprocessing techniques together with the right algorithms as well as the correct tuning of their parameters. To decide which algorithm to choose, data scientists often consider families of algorithms in which they are experts, and can leave aside algorithms that are more “exotic” to them, but could perform better for the problem they are trying to solve.

+

This task is highly complex because of the increasing number of available algorithms, the difficulty in choosing the correct preprocessing techniques together with the right algorithms as well as the correct tuning of their parameters ​(Serban at al, 2013). To decide which algorithm to choose, data scientists often consider families of algorithms in which they are experts, and can leave aside algorithms that are more “exotic” to them, but could perform better for the problem they are trying to solve.

ROCKFlows ​ is a project aiming at helping users to create their own Machine Learning Workflows by simply describing their dataset and objectives. ​

ROCKFlows ​ is a project aiming at helping users to create their own Machine Learning Workflows by simply describing their dataset and objectives. ​

Line 25:

Line 25:

The thesis must address the following challenges: Relevance and quality of predictions and Scalability to manage the huge mass of ML workflows. ​

The thesis must address the following challenges: Relevance and quality of predictions and Scalability to manage the huge mass of ML workflows. ​

To meet these challenges, attention should be paid to the following aspects: ​

To meet these challenges, attention should be paid to the following aspects: ​