In this work we introduce a color based image-audio system that enhances the perception of the visually impaired users. Traditional sound-vision substitution systems mainly translate gray scale images into corresponding audio frequencies. However, these algorithms deprive the user from the color information, an critical factor in object recognition and also for attracting visual attention. We propose an algorithm that translates the scene into sound based on some classical computer vision algorithms. The most salient visual regions are extracted by a hybrid approach that blends the computed salient map with the segmented image. The selected image region is simplified based on a reference color map dictionary. The centroid of the color space are translated into audio by different musical instruments. We chose to encode the audio file by polyphonic music composition reasoning that humans are capable to distinguish more than one instrument in the same time but also to reduce the playing duration. Testing the prototype demonstrate that non-proficient blindfold participants can easily interpret sequence of colored patterns and also to distinguish by example the quantity of a specific color contained by a given image.