With this system, the "Singing articulations" (Collections of voice snippets, such as phrases, and snippets of vocal expression variations like vibrato) needed to reproduce vocals are collected from custom produced recordings of accomplished singers and put into a database after conversion into frequency domains.

To synthesize vocal parts, the system retrieves data consisting of voice snippets, applies pitch conversion, and splices and shapes them to form the words of a song as input by the user. As this processing is done at the frequency domain level, pitch can be easily changed according to the specified melody, and the voice snippets can be spliced in a way that reproduces smooth flowing words.

Vocaloid itself consists of a score editor, which does the scale, song word, and expression processing; the Vocal Sound Generator, the engine that synthesizes the vocals; and libraries (each comprised of a pronunciation database and a timbre database) for each vocal. New vocal libraries can be created by recording real voices pronouncing basic vocabulary and reproducing variation effects (such as vibrato) according to templates.