Paolo Viviani

Short Bio

Paolo Viviani is a PhD student in Computer Science at University of Turin. He holds a grant founded by the Noesis Solution NV. His main research topics are the development and deployment of machine learning methodologies on high-performance infrastructures, with focus on distributed deep learning techniques.

Fields of interest:

Development and deployment of machine/deep learning methodologies on HPC.

Cloud, containers and virtualization for HPC.

Noesis’ technical contact for the following European Research projects:

MACH: MAssive Calculations on Hybrid systems, ITEA2 project 12002. The goal of the project is to develop a DSeL and a computation framework that allows to access hybrid hardware acceleration without specific expertise.

Fortissimo 2:Horizon 2020-FoF-2015 project. FF2 is a collaborative project that will enable European SMEs to be more competitive globally through the use of simulation services running on a High Performance Computing cloud infrastructure.

CloudFlow: FP7, ICT for Manufacturing SMEs (I4MS) project. Cloudflow will enable the remote use of computational services distributed on the cloud, seamlessly integrating these within established engineering design workflows and standards.

In April 2018, under the auspices of the POR-FESR 2014-2020 program of Italian Piedmont Region, the Turin’s Centre on High-Performance Computing for Artificial Intelligence (HPC4AI) was funded with a capital investment of 4.5Me and it began its deployment. HPC4AI aims to facilitate scientific research and engineering in the areas of Artificial Intelligence and Big Data Analytics. HPC4AI will specifically focus on methods for the on-demand provisioning of AI and BDA Cloud services to the regional and national industrial community, which includes the large regional ecosystem of Small-Medium Enterprises (SMEs) active in many different sectors such as automotive, aerospace, mechatronics, manufacturing, health and agrifood.

@inproceedings{18:hpc4ai_acm_CF,
abstract = {In April 2018, under the auspices of the POR-FESR 2014-2020 program of Italian Piedmont Region, the Turin's Centre on High-Performance Computing for Artificial Intelligence (HPC4AI) was funded with a capital investment of 4.5Me and it began its deployment. HPC4AI aims to facilitate scientific research and engineering in the areas of Artificial Intelligence and Big Data Analytics. HPC4AI will specifically focus on methods for the on-demand provisioning of AI and BDA Cloud services to the regional and national industrial community, which includes the large regional ecosystem of Small-Medium Enterprises (SMEs) active in many different sectors such as automotive, aerospace, mechatronics, manufacturing, health and agrifood.
},
address = {Ischia, Italy},
author = {Marco Aldinucci and Sergio Rabellino and Marco Pironti and Filippo Spiga and Paolo Viviani and Maurizio Drocco and Marco Guerzoni and Guido Boella and Marco Mellia and Paolo Margara and Idillio Drago and Roberto Marturano and Guido Marchetto and Elio Piccolo and Stefano Bagnasco and Stefano Lusso and Sara Vallero and Giuseppe Attardi and Alex Barchiesi and Alberto Colla and Fulvio Galeazzi},
booktitle = {ACM Computing Frontiers},
date-added = {2018-04-21 14:18:48 +0000},
date-modified = {2018-04-21 14:26:05 +0000},
doi = {10.1145/3203217.3205340},
keywords = {hpc4ai, c3s},
month = may,
title = {HPC4AI, an AI-on-demand federated platform endeavour},
url = {http://alpha.di.unito.it/storage/papers/2018_hpc4ai_ACM_CF.pdf},
year = {2018},
bdsk-url-1 = {http://alpha.di.unito.it/storage/papers/2018_hpc4ai_ACM_CF.pdf},
bdsk-url-2 = {https://doi.org/10.1145/3203217.3205340}
}

This work presents an innovative approach adopted for the development of a new numerical software framework for accelerating dense linear algebra calculations and its application within an engineering context. In particular, response surface models (RSM) are a key tool to reduce the computational effort involved in engineering design processes like design optimization. However, RSMs may prove to be too expensive to be computed when the dimensionality of the system and/or the size of the dataset to be synthesized is significantly high or when a large number of different response surfaces has to be calculated in order to improve the overall accuracy (e.g. like when using ensemble modelling techniques). On the other hand, the potential of modern hybrid hardware (e.g. multicore, GPUs) is not exploited by current engineering tools, while they can lead to a significant performance improvement. To fill this gap, a software framework is being developed that enables the hybrid and scalable acceleration of the linear algebra core for engineering applications and especially of RSMs calculations with a user-friendly syntax that allows good portability between different hardware architectures, with no need of specific expertise in parallel programming and accelerator technology. The effectiveness of this framework is shown by comparing an accelerated code to a single-core calculation of a radial basis function RSM on some benchmark datasets. This approach is then validated within a real-life engineering application and the achievements are presented and discussed.

@inbook{17:viviani:advstruct,
abstract = {This work presents an innovative approach adopted for the development of a new numerical software framework for accelerating dense linear algebra calculations and its application within an engineering context. In particular, response surface models (RSM) are a key tool to reduce the computational effort involved in engineering design processes like design optimization. However, RSMs may prove to be too expensive to be computed when the dimensionality of the system and/or the size of the dataset to be synthesized is significantly high or when a large number of different response surfaces has to be calculated in order to improve the overall accuracy (e.g. like when using ensemble modelling techniques). On the other hand, the potential of modern hybrid hardware (e.g. multicore, GPUs) is not exploited by current engineering tools, while they can lead to a significant performance improvement. To fill this gap, a software framework is being developed that enables the hybrid and scalable acceleration of the linear algebra core for engineering applications and especially of RSMs calculations with a user-friendly syntax that allows good portability between different hardware architectures, with no need of specific expertise in parallel programming and accelerator technology. The effectiveness of this framework is shown by comparing an accelerated code to a single-core calculation of a radial basis function RSM on some benchmark datasets. This approach is then validated within a real-life engineering application and the achievements are presented and discussed.},
address = {Cham},
author = {Viviani, P. and Aldinucci, M. and d'Ippolito, R. and Lemeire, J. and Vucinic, D.},
booktitle = {Improved Performance of Materials: Design and Experimental Approaches},
date-modified = {2018-03-13 16:40:21 +0000},
doi = {10.1007/978-3-319-59590-0_9},
isbn = {978-3-319-59590-0},
keywords = {repara, rephrase},
opteditor = {{\"O}chsner, Andreas and Altenbach, Holm},
pages = {93--106},
publisher = {Springer International Publishing},
title = {A Flexible Numerical Framework for Engineering---A Response Surface Modelling Application},
url = {https://doi.org/10.1007/978-3-319-59590-0_9},
year = {2018},
bdsk-url-1 = {https://doi.org/10.1007/978-3-319-59590-0_9},
bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-319-59590-0_9}
}

The present trend in big-data analytics is to exploit algorithms with (sub-)linear time complexity, in this sense it is usually worth to investigate if the available techniques can be approximated to reach an affordable complexity. However, there are still problems in data science and engineering that involve algorithms with higher time complexity, like matrix inversion or Singular Value Decomposition (SVD). This work presents the results of a survey that reviews a number of tools meant to perform dense linear algebra at “Big Data” scale: namely, the proposed approach aims first to define a feasibility boundary for the problem size of shared-memory matrix factorizations, then to understand whether it is convenient to employ specific tools meant to scale out such dense linear algebra tasks on distributed platforms. The survey will eventually discuss the presented tools from the point of view of domain experts (data scientist, engineers), hence focusing on the trade-off between usability and performance.

@inproceedings{svd:pdp:18,
abstract = {The present trend in big-data analytics is to exploit algorithms with (sub-)linear time complexity, in this sense it is usually worth to investigate if the available techniques can be approximated to reach an affordable complexity. However, there are still problems in data science and engineering that involve algorithms with higher time complexity, like matrix inversion or Singular Value Decomposition (SVD). This work presents the results of a survey that reviews a number of tools meant to perform dense linear algebra at ``Big Data'' scale: namely, the proposed approach aims first to define a feasibility boundary for the problem size of shared-memory matrix factorizations, then to understand whether it is convenient to employ specific tools meant to scale out such dense linear algebra tasks on distributed platforms. The survey will eventually discuss the presented tools from the point of view of domain experts (data scientist, engineers), hence focusing on the trade-off between usability and performance.},
address = {Cambridge, United Kingdom},
author = {Paolo Viviani and Maurizio Drocco and Marco Aldinucci},
booktitle = {Proc. of 26th Euromicro Intl. Conference on Parallel Distributed and network-based Processing (PDP)},
date-modified = {2018-01-30 11:07:31 +0000},
keywords = {svd, big data, linear algebra},
publisher = {IEEE},
title = {Scaling Dense Linear Algebra on Multicore and Beyond: a Survey},
url = {https://iris.unito.it/retrieve/handle/2318/1659340/387685/preprint_aperto.pdf},
year = {2018},
bdsk-url-1 = {https://iris.unito.it/retrieve/handle/2318/1659340/387685/preprint_aperto.pdf}
}

The cloud environment is increasingly appealing for the HPC community, which has always dealt with scientific applications. However, there is still some skepticism about moving from traditional physical infrastructures to virtual HPC clusters. This mistrusting probably originates from some well known factors, including the effective economy of using cloud services, data and software availability, and the longstanding matter of data stewardship. In this work we discuss the design of a framework (based on Mesos) aimed at achieving a cost-effective and efficient usage of heterogeneous Processing Elements (PEs) for workflow execution, which supports hybrid cloud bursting over preemptible cloud Virtual Machines.

@inproceedings{18:parco:workflow,
abstract = {The cloud environment is increasingly appealing for the HPC community, which has always dealt with scientific applications. However, there is still some skepticism about moving from traditional physical infrastructures to virtual HPC clusters. This mistrusting probably originates from some well known factors, including the effective economy of using cloud services, data and software availability, and the longstanding matter of data stewardship. In this work we discuss the design of a framework (based on Mesos) aimed at achieving a cost-effective and efficient usage of heterogeneous Processing Elements (PEs) for workflow execution, which supports hybrid cloud bursting over preemptible cloud Virtual Machines.},
author = {Fabio Tordini and Marco Aldinucci and Paolo Viviani and Ivan Merelli and Pietro Li{\`{o}}},
booktitle = {Proc. of the Intl. Conference on Parallel Computing, ParCo 2017, 12-15 September 2017, Bologna, Italy},
date-added = {2018-01-21 15:15:01 +0000},
date-modified = {2018-03-13 16:44:11 +0000},
doi = {10.3233/978-1-61499-843-3-605},
keywords = {rephrase},
publisher = {{IOS} Press},
series = {Advances in Parallel Computing},
title = {Scientific Workflows on Clouds with Heterogeneous and Preemptible Instances},
url = {https://iris.unito.it/retrieve/handle/2318/1658510/385411/main.pdf},
year = {2018},
bdsk-url-1 = {https://iris.unito.it/retrieve/handle/2318/1658510/385411/main.pdf}
}

The Armadillo C++ library provides programmers with a high-level Matlab-like syntax for linear algebra. Its design aims at providing a good balance between speed and ease of use. It can be linked with different back-ends, i.e. different LAPACK-compliant libraries. In this work we present a novel run-time support of Armadillo, which gracefully extends mainstream implementation to enable back-end switching without recompilation and multiple back-end support. The extension is specifically designed to not affect Armadillo class template prototypes, thus to be easily interoperable with future evolutions of the Armadillo library itself. The proposed software stack is then tested for functionality and performance against a kernel code extracted from an industrial application.

@inproceedings{17:sac:armadillo,
abstract = {The Armadillo C++ library provides programmers with a high-level Matlab-like syntax for linear algebra. Its design aims at providing a good balance between speed and ease of use. It can be linked with different back-ends, i.e. different LAPACK-compliant libraries. In this work we present a novel run-time support of Armadillo, which gracefully extends mainstream implementation to enable back-end switching without recompilation and multiple back-end support. The extension is specifically designed to not affect Armadillo class template prototypes, thus to be easily interoperable with future evolutions of the Armadillo library itself. The proposed software stack is then tested for functionality and performance against a kernel code extracted from an industrial application.},
address = {Marrakesh, Morocco},
author = {Paolo Viviani and Massimo Torquati and Marco Aldinucci and Roberto d'Ippolito},
booktitle = {In proc. of the 32nd ACM Symposium on Applied Computing (SAC)},
date-added = {2016-08-19 21:47:45 +0000},
date-modified = {2017-06-13 15:54:43 +0000},
keywords = {nvidia, repara, rephrase, itea2},
month = apr,
pages = {1566--1573},
title = {Multiple back-end support for the Armadillo linear algebra interface},
url = {https://iris.unito.it/retrieve/handle/2318/1626229/299089/armadillo_4aperto.pdf},
year = {2017},
bdsk-url-1 = {https://iris.unito.it/retrieve/handle/2318/1626229/299089/armadillo_4aperto.pdf}
}

The aim of this work is to provide developers and domain experts with simple (Matlab-like) inter- face for performing linear algebra tasks while retaining state-of-the-art computational speed. To achieve this goal we extend Armadillo C++ library is extended in order to support with multiple LAPACK-compliant back-ends targeting different architectures including CUDA GPUs; moreover our approach involves the possibility of dynamically switching between such back-ends in order to select the one which is most convenient based on the specific problem and hardware configura- tion. This approach is eventually validated within an industrial environment.

@inproceedings{16:acaces:armadillo,
abstract = {The aim of this work is to provide developers and domain experts with simple (Matlab-like) inter- face for performing linear algebra tasks while retaining state-of-the-art computational speed. To achieve this goal we extend Armadillo C++ library is extended in order to support with multiple LAPACK-compliant back-ends targeting different architectures including CUDA GPUs; moreover our approach involves the possibility of dynamically switching between such back-ends in order to select the one which is most convenient based on the specific problem and hardware configura- tion. This approach is eventually validated within an industrial environment.},
address = {Fiuggi, Italy},
author = {Paolo Viviani and Marco Aldinucci and Roberto d'Ippolito},
booktitle = {Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES) -- Poster Abstracts},
date-added = {2016-08-20 17:22:51 +0000},
date-modified = {2016-08-20 17:29:35 +0000},
keywords = {nvidia,algebra, gpu, itea2, repara},
month = {July},
title = {An hybrid linear algebra framework for engineering},
url = {https://iris.unito.it/retrieve/handle/2318/1622382/300198/armadillo.pdf},
year = {2016},
bdsk-url-1 = {https://iris.unito.it/retrieve/handle/2318/1622382/300198/armadillo.pdf}
}

This work presents the innovative approach adopted for the development of a new numerical software framework for accelerating Dense Linear Algebra calculations and its application within an engineering context. In particular, Response Surface Models (RSM) are a key tool to reduce the computational effort involved in engineering design processes like design optimization. However, RSMs may prove to be too expensive to be computed when the dimensionality of the system and/or the size of the dataset to be synthesized is significantly high or when a large number of different Response Surfaces has to be calculated in order to improve the overall accuracy (e.g. like when using Ensemble Modelling techniques). On the other hand, it is a known challenge that the potential of modern hybrid hardware (e.g. multicore, GPUs) is not exploited by current engineering tools, while they can lead to a significant performance improvement. To fill this gap, a software framework is being developed that enables the hybrid and scalable acceleration of the linear algebra core for engineering applications and especially of RSMs calculations with a user-friendly syntax that allows good portability between different hardware architectures, with no need of specific expertise in parallel programming and accelerator technology. The effectiveness of this framework is shown by comparing an accelerated code to a single-core calculation of a Radial Basis Function RSM on some benchmark datasets. This approach is then validated within a real-life engineering application and the achievements are presented and discussed.

@inproceedings{16:acex:armadillo,
abstract = {This work presents the innovative approach adopted for the development of a new numerical software framework for accelerating Dense Linear Algebra calculations and its application within an engineering context.
In particular, Response Surface Models (RSM) are a key tool to reduce the computational effort involved in engineering design processes like design optimization. However, RSMs may prove to be too expensive to be computed when the dimensionality of the system and/or the size of the dataset to be synthesized is significantly high or when a large number of different Response Surfaces has to be calculated in order to improve the overall accuracy (e.g. like when using Ensemble Modelling techniques).
On the other hand, it is a known challenge that the potential of modern hybrid hardware (e.g. multicore, GPUs) is not exploited by current engineering tools, while they can lead to a significant performance improvement. To fill this gap, a software framework is being developed that enables the hybrid and scalable acceleration of the linear algebra core for engineering applications and especially of RSMs calculations with a user-friendly syntax that allows good portability between different hardware architectures, with no need of specific expertise in parallel programming and accelerator technology.
The effectiveness of this framework is shown by comparing an accelerated code to a single-core calculation of a Radial Basis Function RSM on some benchmark datasets. This approach is then validated within a real-life engineering application and the achievements are presented and discussed.
},
author = {Paolo Viviani and Marco Aldinucci and Roberto d'Ippolito and Jean Lemeire and Dean Vucinic},
booktitle = {10th Intl. Conference on Advanced Computational Engineering and Experimenting (ACE-X)},
date-added = {2016-08-19 21:37:19 +0000},
date-modified = {2017-06-19 15:35:39 +0000},
keywords = {repara, rephrase, nvidia, gpu},
title = {A flexible numerical framework for engineering - a Response Surface Modelling application},
year = {2016}
}

2015

Modern experimental achievements, with LHC results as a prominent but not exclusive representative, have undisclosed a new range of challenges concerning theoretical com- putations. Tree level QED calculation are no more satisfactory due to the very small experimental uncertainty of precision e+ e- measurements, so Next To Leading and Next to Next to Leading Order calculations are required. At the same time many-legs, high-order QCD processes needed to simulate LHC events are raising even more the bar of computational complexity. The drive for the present work has been the interest in calculating high multiplicity Higgs boson processes with a dedicated software library (RECOLA) currently under development at the University of Torino, as well as the related technological challenges. This thesis undertakes the task of exploring the possibilities offered by present and upcoming computing technologies in order to face these challenges properly. The first two chapters outlines the theoretical context and the available technologies. In chapter 3 a a case study is examined in full detail, in order to explore the suitability of different parallel computing solutions. In the chapter 4, some of those solutions are implemented in the context of the RECOLA library, allowing it to handle processes at a previously unexplored scale of complexity. Alongside, the potential of new, cost-effective parallel architectures is tested.

@mastersthesis{tesi:viviani:15,
abstract = { Modern experimental achievements, with LHC results as a prominent but not exclusive representative, have undisclosed a new range of challenges concerning theoretical com- putations. Tree level QED calculation are no more satisfactory due to the very small experimental uncertainty of precision e+ e- measurements, so Next To Leading and Next to Next to Leading Order calculations are required. At the same time many-legs, high-order QCD processes needed to simulate LHC events are raising even more the bar of computational complexity. The drive for the present work has been the interest in calculating high multiplicity Higgs boson processes with a dedicated software library (RECOLA) currently under development at the University of Torino, as well as the related technological challenges.
This thesis undertakes the task of exploring the possibilities offered by present and upcoming computing technologies in order to face these challenges properly. The first two chapters outlines the theoretical context and the available technologies. In chapter 3 a a case study is examined in full detail, in order to explore the suitability of different parallel computing solutions. In the chapter 4, some of those solutions are implemented in the context of the RECOLA library, allowing it to handle processes at a previously unexplored scale of complexity. Alongside, the potential of new, cost-effective parallel architectures is tested.},
author = {Paolo Viviani},
date-added = {2015-09-27 12:36:54 +0000},
date-modified = {2015-09-27 13:28:24 +0000},
keywords = {fastflow,impact},
school = {Physics Department, University of Torino},
title = {Parallel Computing Techniques for High Energy Physics},
year = {2015}
}