1 Answer
1

There was something like that in Aphex Twin – Windowlicker (1999 single), visible in the spectrogram with a logarithmic frequency axis:

According to the album's Wikipedia page, the image-to-audio conversion was made with Metasynth. I don't know the specifics of Metasynth, but a similar effect can be achieved by calculating the inverse discrete Fourier transform (IDFT) of each pixel column and concatenating the resulting time-domain frames. To make it sound smoother, the time-domain frames can be cross-faded and the phases of the frequency bins can be randomized to disperse the times when the phases of the frequencies match. The image can be distorted so that the vertical coordinate corresponds to a logarithmic frequency scale.

The problem could be stated as: How can I create an audio signal that produces a spectrogram that looks like the input image. To make a really good conversion, similar methods could be used as in time-scale/pitch modification. Some of those try to maintain phase coherence of frequencies (particularly of those that fall between frequency bins), across time-domain frames. Some of those try to maintain the time coherence of transients – things that are more concentrated in time domain than in frequency domain.