Could the names of some foods make them sound heavier or lighter than others? This seems unlikely; after all, as Shakespeare said in Romeo and Juliet:

What’s in a name? that which we call a rose
By any other name would smell as sweet;

Juliet is expressing the theory we call conventionalism: that a name for something is just an agreed-upon convention. Conventionalism is the norm in modern linguistics: Whether we call the ingredients in our breakfast scramble “egg,” “oeuf,” or “huevo” shouldn’t matter as long as we all agree.

But there’s an alternative view—one that dates back 2,500 years—called naturalism. Naturalism is the idea that a name might naturally fit an object, that some names might naturally “sound sweeter” than others. It was Plato, in the Cratylus, who pointed out that sometimes sounds seem to carry meaning. We call this phenomenon sound symbolism, and modern linguistic research has found support for Plato’s position.

Let’s talk about vowels. Front vowels are those—like the eee vowel in cheese, represented as i in the illustration below, or the vowel in mint or thin represented by capital I—made by holding the tongue high in the front part of the mouth. The figure below shows a schematic cutaway of the head, with the lips and teeth on the left, and the tongue high up toward the front of the mouth. By contrast back vowels are made with the tongue lower in the back of the mouth, like the vowels in bold, coarse, or large, as shown in the illustration.

Courtesy of Dan Jurafsky

Across most languages of the world, front vowels tend to be used in words for small, thin, light things, and back vowels in words for big, fat, heavy things. It’s not always true, but it’s a tendency that you can see in any of the stressed vowels in words like little, teeny, or itsy-bitsy (all front vowels) versus humongous or enormous (back vowels). Or Spanish chico (front vowel, meaning “small”) versus the gordo (back vowel, meaning “fat”). Or French petit (front vowel) versus grand (back vowel).

In one marketing study, for example, Richard Klink at Loyola University Maryland created pairs of made-up product brand names that were identical except for having front vowels or back vowels and asked participants to answer: Which brand of laptop seems bigger, Detal or Dutal? Which brand of ketchup seems thicker, Nellen or Nullen? Which brand of beer seems darker, Esab or Usab?

In each case, the product named with back vowels (Dutal, Nullen) was chosen as the larger, heavier, thicker product.

Since ice cream is a product whose whole purpose is to be rich, creamy, and heavy, it is not surprising that people seem to prefer ice creams that are named with back vowels. Eric Yorkston and Geeta Menon at New York University found that participants asked about a hypothetical ice cream named either Frish (front vowel) or Frosh (back vowel) rated Frosh as smoother, creamier, and richer than Frish.

Do manufacturers make use of this subconscious association of back vowels with richness and creaminess? I checked to see whether commercial ice creams (like Häagen-Dazs and Ben & Jerry’s) were more likely to use back vowels in their flavor names, and conversely whether thin, light foods like crackers would have more front vowels in their brand names.

The difference in meanings between front and back vowels seems like clear evidence for naturalism. But why this particular difference? The most widely accepted theory, linguist John Ohala’s frequency code, suggests that we instinctively associated low-pitched (low-frequency) sounds with big things (big animals like lions have a deep roar) and high-pitched (high-frequency) sounds with little things (small animals like birds have high-pitched tweets). Front vowels like the ones in thin and teeny have a particular high pitch (a resonance that linguists call their “second formant”), so we associate them with small things, while back vowels like a or u have a low second formant resonance, so we associate them with big things. In fact linguist Kate Geenberg has shown that when we talk to babies we instinctively move all our vowels a little bit toward the i sound.

The frequency code isn’t the only kind of sound symbolism. Consonant sounds also seem to encode some kinds of meaning, and in particular the difference between consonants like t or k and consonants like m, l, or b. Consider these two pictures:

Courtesy of Dan Jurafsky

Suppose I told you that in the Martian language one of these two was called bouba and the other was called kiki and you had to guess which was which. Think for a second. Which picture is bouba? Which is kiki? How about the words maluma or takete?

If you’re like most people, you called the jagged picture on the left kiki or taketeand the round one on the right bouba or maluma.This test was invented by German psychologist Wolfgang Köhler, one of the founders of Gestalt psychology, in 1929. This experiment has been repeated with all sorts of made-up words with similar sounds, from Swedish to Swahili to a remote nomadic population of northern Namibia, and the results are astonishingly consistent in all the languages and even in 2½-year-old toddlers.

This difference between consonants like t and k and consonants like m and l is even true in kitchen utensils. Linguist Annette D’Onofrio has found when people are asked to choose “Martian” names, round bowls, spoons, or ladles are more likely to be labeledbouba and spiky forks and skewers kiki. There’s a closer link to food from the lab of Oxford psychologist Charles Spence, one of the world’s foremost researchers in sensory perception. In one paper, for example, Spence, Mary Kim Ngo, and Reeva Misra asked people to eat a piece of chocolate and say whether the taste bet­ter matched maluma or takete. People eating milk chocolate (Lindt extra creamy 30 percent cocoa) said the taste fit maluma(and also matched the curvier figure). People eating dark chocolate (Lindt 70 percent and 90 percent cocoa) instead chose takete(and matched the jagged figure). In another paper they found that carbonated water was perceived as more kiki (and spiky) while still water was perceived as more bouba (and curvy). In other words, words with m and l sounds like maluma were associ­ated with creamier or smoother tastes, and words with t and k sounds like takete were associated with sharper or pricklier tastes.

One proposal for what’s going on in this example of naturalism has to do with continu­ity and smoothness. Sounds like m, l, and r, as in maluma are called continuants because they are continuous and smooth acoustically (the sound is pretty consis­tent across its whole length). Continuants are more closely associated with smoother figures. By contrast, strident sounds that abruptly start and stop, like t and k as in takete are associated with the spiky figures. The consonant t has the most distinct jagged burst of energy of any consonant in English.

To help you visualize this, look at this picture of sound waves from a recording of the words maluma (left) and takete (right).

Courtesy of Dan Jurafsky

What I call the synesthetic hypothesis suggests that the perception of acoustic smoothness by one of our five senses, hearing, is somehow linked to the perception of smoothness by other senses, including vision (see­ing a curvy figure instead of a jagged one) and taste (tasting a creamy instead of sharp taste).

Synesthesia is the general name for the phenomenon of strong associa­tions between the different senses. Some people, for example, strongly associate musical notes with colors or words with tastes (one synesthete reports that the word safety tastes like buttered toast). The bouba/kiki results sug­gest that, to at least some extent, we are all a little bit synesthetic. Some­thing about our senses are linked at least enough so that what is smooth in one is associated with being smooth in another, so that we feel the similarity between sharpness detected by smell (as in cheddar), sharpness detected by touch or vision (like acute angles), and sharpness detected by hearing (abrupt changes in sound).

We can see this link between the senses even in our daily vocabu­lary. The words sharp and pungent both originally meant something tactile and visual: something that feels pointy or subtends a small visual angle, but both words can be applied to tastes and smells as well. It’s not clear to what extent these synesthetic links are innate or genetic, and to what extent they are cultural. For example, nomadic tribes in Namibia do associate takete with spiky pictures, but, unlike speakers of many other languages, they don’t associate either the word or the pictures with the bitterness of dark chocolate or with carbon­ation. This suggests that the fact that we perceive bitter chocolate as sharper than milk chocolate or carbonated water as sharper than flat water is a metaphor that we learn culturally to associate with these foods. But we really don’t know yet, because we are just at the begin­ning of understanding these aspects of perception.

There are, however, some evolutionary implications of the synes­thetic smoothness hypothesis and of the frequency code. For example, animals use deeper sounds when hostile, as if to appear larger, but higher pitched sounds when friendly, as if to appear smaller and less threatening. Linguist John Ohala suggests that this may explain the origin of the smile, which is similarly associated with appeasing or friendly behavior. The way we make a smile is by retracting the corners of the mouth, which makes your vowels more eee-like (that’s why we say “cheese” when we take pictures; the eee-sound is the smiling vowel). Ohala’s theory is thus that smiling was originally an appeasement gesture, making the voice higher pitched to mean something like “I’m tiny and harmless, don’t bother me.”

The two aspects of naturalism we’ve encountered, the frequency code and the synesthetic hypothesis, may even help us solve the mystery of how language originated. How did the very first listener know what the very first speaker was talking about? Naturalism suggests that it might have been possible for the listener to guess these “natural” meanings. Perhaps one of the earliest words created by some cavewoman had high-pitched eee sounds that meant “baby,” low-pitched oooooo sounds that meant “big,” or a word with acoustically abrupt k’s and t’s like kiki that meant “sharp,” or a word with smooth m’s and l’s like maluma that meant “curvy.”

Whatever their early origins, vowels and consonants have become part of a rich and beautiful system for expressing complex meanings by combining sounds into words, just as smiling has evolved into a means of expressing many shades of happiness, love, and much else. And, in the end, there is always ice cream, as a much later bard, Wallace Stevens, told us:

Let be be finale of seem.
The only emperor is the emperor of ice-cream.