Music engages much of the brain, and coordinates a wide range of processing mechanisms. This naturally invites consideration of how music cognition might relate to other complex cognitive abilities. Language is an obvious candidate, as, like music, it relies on interpreting complex acoustic sequences that unfold in time.

Whether music and language cognition share basic ways of making sense of sound has only recently begun to be studied empirically. An exciting picture is emerging. There are more connections between the domains than might be expected on the basis of dominant theories of musical and linguistic cognition — from sensory mechanisms that encode sound structure to abstract processes involved in integrating words or musical tones into syntactic structures. Comparative music–language research offers a way to explore the processing underlying both domains. Such work may lead to a deeper understanding than could be achieved by studying each domain in isolation.

Practically all the work in this area, including my own, has focused on Western languages and musical traditions. This has been a productive starting point because both have been studied the most deeply theoretically and empirically. It is now time to broaden the cultural scope of comparative cognitive research.

World view

Fascinating questions about music and language emerge when one looks beyond Western culture. For example, what do musical scales (such as 'do-re-mi-fa-so-la-ti-do') have to do with language? At first glance, the answer seems to be 'very little'. No human language, not even those in which a word's pitch can change its meaning, organizes pitch in terms of musical scales.

In the West there has been a tendency to view the structure of our musical scales as a product of nature, reflecting the laws of acoustics and of auditory physiology. Musical scales and their constituent pitch intervals are implicitly considered a sort of mathematics made audible, a view that can be traced back to Pythagoras' experiments with vibrating strings. Elsewhere, the Javanese pelog and slendro scales have pitch intervals not found in Western scales, and there is substantial variation in the tuning of these from one gamelan orchestra to the next. The subtle microtones of Arabic and Indian music, which enrapture native listeners, can sound out of tune to Western ears.

Hence the structure of our Western scales cannot be considered universal. What is universal across cultures is the use of a small and consistent set of pitches and intervals within the octave as a framework for performance and perception. Within any given culture, listeners absorb this system simply through exposure and unconsciously use it to extract discrete pitch categories from signals in which pitch varies continuously (as in song, where there are often smooth glides between notes).

Viewed in this way, there is a conceptual connection to the learning of sound categories in language. Each language has its own set of distinctive speech sounds or phonemes, which native listeners learn implicitly as part of making sense of the sound stream that reaches their ears. Music uses pitch to distinguish the notes and intervals of the scale; language largely uses timbre to distinguish phonemes.

Crucially, both domains rely on the ability of the mind to create and maintain discrete sound categories in the face of complex and time-varying acoustic signals. Speech and music may share some of the basic brain processes for forming sound categories, even though the end products are built from different acoustic 'stuff'.

Beat poetry

Studies of non-Western music suggest that music is not an island in the brain.

Non-Western cultures prompt another question about music–language relations: to what extent are basic aspects of rhythm perception universal? On the basis of research in Western European countries, it has been claimed for more than 100 years that a ubiquitous aspect of rhythm perception is the tendency to hear grouping or phrasing in auditory patterns in a particular way. For example, when events in a sequence vary in duration (such as tones of alternating lengths: ... long–short long–short ...), it is claimed that listeners hear long events as final. In other words, the perception would be of a repeating short–long group, rather than the logically possible alternative of a repeating long–short group. However, recent research on non-Western rhythm perception shows that even this basic aspect of our interpretation of sound varies between cultures. Many Japanese adults hear sequences of this sort as repeating long–short groups.

This difference is unlikely to be innate, so should be relatable to some characteristic auditory patterns in Japanese culture. One might assume it reflects familiar musical rhythms. Studies by my research team in collaboration with colleagues in Japan suggest instead that the key factor is language. English and many Western European languages put short grammatical words before the longer content word to which they are syntactically bound (for example, 'the book', le livre, het boek), creating an inventory of short–long linguistic patterns. Japanese puts grammatical words after their associated content word (for example, hon-wo, where hon means 'book' and wo is a grammatical particle), creating frequent long–short acoustic chunks in the language.

I think that language syntax strongly influences a listener's ambient rhythmic environment and shapes how they hear even non-linguistic patterns at a basic level. We are now pursuing this hypothesis in further cross-cultural and developmental studies.

Bangs and whistles

Some phenomena do not fit neatly into either the language or the music category; they seem to have a foot in both camps. Consider the 'talking drums' of west and central Africa. Drummers communicate linguistic messages by mimicking the tones and syllabic rhythms of utterances in African tone languages. In these, the pitch pattern of a word is as much a part of its identity as its vowels and consonants. Changing a word's pitch can entirely change its meaning — from 'wing' to 'bag', say.

The Yoruba people of Nigeria play the hourglass-shaped dundun drum. The leader uses the drum to 'talk' during musical performances. The Lokele of the upper Congo in central Africa use hollowed-out logs as large slit drums to communicate linguistic messages across wide stretches of jungle, as they can be heard far beyond the range of human voices. Drummed messages embedded in musical performance are understood by listeners familiar with the drum language, but can go completely unnoticed as language by a naive listener. Most important, the messages are not confined to a stock set of utterances. They can convey novel phrases — albeit not as efficiently as ordinary spoken language because ambiguities are created by many words having the same tonal pattern. This problem is dealt with by placing words in longer, stereotyped poetic phrases.

Credit: D. PARKINS

Another speech surrogate is whistled language. Based on tone languages, oral whistles are used to convey the rhythmic and tonal pattern of syllables. Rhythm is cued by variations in the loudness of the whistle, and tonal patterns by the pitch. Whistled languages occur in Africa, Asia and Central America. For example, the whistled language of the Hmong people of southeast Asia is based on their spoken language and encodes the seven different tones that they use to distinguish word meaning in speech. Native listeners find it easy to understand despite minimal cues to vowel and consonant identity. Again, whistled speech can convey original linguistic utterances even though to the uninitiated the sound patterns may seem more like music. Ordinary languages or click languages are recognizable as speech, even when unintelligible; not so whistled languages.

The cognitive processes that enable the production and comprehension of talking drums and whistled speech are not well understood. They are almost completely unstudied by modern neuroscience. Yet they probably hold important clues to how the biological building-blocks of language and music can be fluidly and dynamically reconfigured, rather than being exclusively bound to one domain or the other from birth.

Studies of non-Western music suggest that music is not an island in the brain. Intimations of deep links between music and language extend back to Plato, Charles Darwin and Ludwig Wittgenstein. Modern cognitive science is replacing speculation with research, and finding numerous links that bind these domains together as cognitive systems. These findings bear on a wide range of debates, from the 'modularity' of linguistic mechanisms to the evolutionary origins of music. We are a musical species as much as we are a linguistic one. By looking at cognition through both of these lenses, we may see deeper into the mechanisms that give our species its remarkable power to make sense of sound.

Further reading

Carrington, J. F. Sci. Am. 225, 91–94 (1971).

Patel, A. D. Music, Language, and the Brain (Oxford Univ. Press, New York, 2008).

Patel, A. D., Iversen, J. R. & Ohgushi, K. J. Acoust. Soc. Am. 120, 3167 (2006).

Perlman, M. & Krumhansl, C. L. Music Percept. 14, 95–116 (1996).

Rialland, A. Phonology 22, 237–271 (2005).

Reck, D. Music of the Whole Earth (Da Capo Press New York, 1997).