This master thesis wants to address the task of synthesising new sounds us- ing deep learning in an end-to-end ap- proach. That means that when the sys- tem is fed with raw audio, it gener- ates new audio samples without any additional information. Although the use of deep learning is quite new in the field, sound synthesis have always seized the interest of researchers. Syn- thesizers first, and the use of signal processing techniques to model phys- ical systems later, have been studied deeply over the last decades. A break- ing point in the field came in 2016 when Oord et al. presented WaveNet [1]. The network used a deep learn- ing architecture to generate one sam- ple at a time when conditioning it by all the previous ones. In this thesis, different architectures have been de- signed to generate audio samples in an end-to-end approach. WaveNet has been selected over other architectures and a deep exploration has been done. After seeing relevant results by using global conditioning, the network was extended to perform local condition- ing. The benefits of local conditioning have been studied, presenting a final tool that is able to automatically distin- guish and generate specific piano and panflute sounds conditioning them on the mel spectrum and MFCCs.