Adam's Tongue: How Humans Made Language, How Language Made Humans Page 8
Thus, great ape ACSs are more limited in range than monkey ACSs. According to the ladder-to-language approach, it should be the other way around. But from the perspective of this book, what we find is exactly what we’d expect. Each species has the system it needs to do what it has to do. Monkeys need warning signals about predators because they are usually smaller than apes, frequently live in areas where they have to spend time on the ground, and are therefore more vulnerable than apes. The more specific the signal, the greater the fitness that results from it. Accordingly, the main classes of predator—aerial, terrestrial, serpentine—may be differentiated, giving rise to things sometimes mistaken for protowords and misinterpreted as “steps toward language.”
Great apes, on the other hand, are large and strong enough to discourage most predators. They also live mostly in densely forested areas where at any time they can climb beyond any predator’s reach (except for the gorilla, of course, which is too large, and can look too ferocious, to form anyone’s prey). They don’t develop alarm calls because they have so little to be alarmed about.
Considered alongside the uniqueness of language, facts like these lead to the conclusion that whatever caused language to blossom appeared not among any living species, but among some species ancestral to humans but long extinct that emerged after our line had separated from the chimp/bonobo line. Any such species would, of course, have had a lot in common with its ape ancestors. But not everything—by no means everything. Crucial differences in food sources, consequent scavenging techniques, and interactions with other species could have caused profound changes in behavior, including ways of communicating and things to be communicated about.
But here we must proceed with immense care. We know next to nothing about what immediately followed the split. All we have to work on are rare and scattered fragments of bone and a handful of extremely primitive tools. We must steer, like mythical navigators, between the Scylla of viewing these ancestors of ours as carbon copies of modern apes and the Charybdis of treating our ignorance as a license to print money—of drawing a picture of our ancestors based more on its intuitive appeal than on its plausibility.
Among all the theories of how language evolved, perhaps the most appealing of all is the singing ape.
MUSIC MAKETH MAN?
Music and language are both universal in the human species and both are unique to it. Each is distinguished by having structure that is complex as well as rule-governed, and by being (in contrast with the songs of most other species) potentially infinite and open-ended. What would be nicer and neater than to find that they are intimately connected, and share the same origin?
In addition to its intuitive appeal, this notion has a long history. The idea that language and music are intimately connected in their birth goes back at least to Rousseau and other Enlightenment philosophers. To Darwin, it appeared “probable that the progenitors of man, either the males or females or both sexes, before acquiring the power of expressing mutual love in articulate language, endeavored to charm each other with musical notes and rhythm.” According to Otto Jespersen, writing half a century later, “language was born in the courting days of mankind—the first utterances of speech I fancy to myself like something between the nightly love-lyrics of puss on the tiles and the melodious love-song of the nightingale.” A more thorough, but in parts equally poetic, development of the notion has appeared most recently in The Singing Neanderthals, by Steven Mithen of the University of Reading. Note that the connection between sex and what has been called “musilanguage,” found in the work of Darwin and Jespersen, fits well with the idea, favored by Mithen and Geoffrey Miller, that language may have originated, at least in part, as a form of sexual display.
One problem with this view is that, when it comes to our nearest relatives, their vocalizations are as lacking in music as they are in objective reference. To find anything that looks like a musilanguage precursor, you have to go back as far as the gibbon.
Gibbons are middling close relatives of humans; they split from the common ancestor of the great apes and us somewhere between twelve and twenty million years ago. And gibbons sing, no question of that. Their songs can last up to an hour and a half, longer than most human songs. Moreover, gibbons engage in duets, almost always performed by mated pairs (gibbons, in contrast to great apes, are highly monogamous). So it has been suggested that some unspecified human ancestor developed similar singing routines. At some subsequent date (in Mithen’s version, as late as the emergence of modern humans, a mere couple of hundred thousand years ago) musilanguage, or what Mithen also calls “Hmmmmm”—holistic, manipulative, multi-modal, musical, and mimetic utterances—would have split into something that became music and something that became language.
In general (except for the aquatic ape hypothesis, which claims that at some stage of their evolution human ancestors lived at least partly in water) evolutionary science prefers blanket explanations—they have the irresistible appeal of two-for-one offers in your local supermarket. If a single umbrella theory will explain a variety of traits, that’s almost always preferred to a series of different explanations for each trait. But here is one case in which, for a variety of reasons, an umbrella theory won’t work.
Such a theory would require that loud and prolonged singing of some kind was performed throughout the period, well over a million years in length, when human ancestors subsisted in a largely treeless savanna, drier and more extensive than the savannas that are presently found in parts of East Africa. Why would prehumans have practiced this behavior, under these conditions?
Well, why do gibbons sing?
Gibbon authorities suggest several main functions. One is pair-bonding; the long and repeated exchanges of the duets are supposed to enhance solidarity between mates, and they seem to work, since gibbons are among the mere 3 percent of mammals that practice monogamy. Another is territorial boundary marking, the warning off of potential trespassers—mated pairs of gibbons divide the rain forests into sharply defined and firmly defended territories. A third is simply keeping in touch with one another, in places where the denseness of the canopy renders one invisible at ranges of more than a few yards.
But what could possibly have been the functions of song for a prehuman species in largely treeless grasslands?
Song as a pair-bonding mechanism is highly unlikely. Human ancestors probably weren’t monogamous—great apes aren’t, and neither are we, even if we try or pretend to be, so a monogamous interval at any time in the past looks unlikely. But suppose we did go through a monogamous period. If two mates don’t happen to be out of sight of each other up two different trees, there are countless more effective ways of bonding than yodeling at each other.
Human ancestors probably weren’t territorial, either—at least not in the sense of holding small, well-defined chunks of territory. Most likely they had a fission-fusion social structure, like that of contemporary apes; that’s to say groups would be continually splitting up and reforming, merging with other groups. In open terrain, where different groups might utilize the same areas at different times without conflict or even contact, what would be the point of noisily defended frontiers?
Furthermore, the terrains in which gibbons and human ancestors lived were such that for maintaining contact sound was essential in one but useless, even dangerous, in the other. Until relatively recently, humans were not forest dwellers, and our ancestors lost the capacity to live in trees when they became fully bipedal, millions of years ago. Gibbons, on the other hand, spend their whole lives in trees, brachiating around with a speed and dexterity unique and quite remarkable for primates of their size.
The length and loudness of gibbon songs result directly from this environment. Their forests are dense, but their populations aren’t; they move about in their pairs, and one pair seldom meets another. Even members of a pair often can’t see each other, so they would lose contact with mates and blunder into other gibbons’ territories if they didn’t make a lot of noise a lo
t of the time. Moreover, they can make noise with impunity—predators below may hear them, but are hardly likely to go after them into the canopy.
But on the savanna, where there were beasts with keen hearing far larger and more lethal than our ancestors, to sing out with any frequency would have been to write one’s own death warrant. Moreover, the absence of trees and the level or undulating nature of most savannas mean that, in contrast with the rain forest, animals are visible at considerable distances. To be out of sight is, under those conditions, almost always to be out of earshot—there’s little point in yelling and hoping your friends will hear you.
To assume that, even if our ancestors had sung before, they would go on singing under these conditions is absurd—something you can do only if you think that behavior and environment are completely divorced from each other. But behavior and environment aren’t watertight compartments—they’re intimately linked; they shape each other into a lock-and-key fit. And this fit is precisely what is meant by “adaptation”; we’ll see exactly how this works when we come to chapter 5.
Conditions on the savanna were such that while they lived there, our ancestors very probably produced less sound than our ape relatives, not more. If this was indeed the case, a single source for music and language becomes highly unlikely. Unless, of course, someone succeeds in coming up with some function prehumans had to perform, under those same savanna conditions, that they couldn’t have performed by any means other than by singing. It’s unlikely anyone will, but never say never in science.
Meanwhile, the sinking theory of musilanguage has recently received a life preserver flung from a quite different direction, one that seems to resolve a completely different problem the theory faces.
This problem involves the mechanics, the nuts and bolts of how you might be able to transition from song to language. It’s the problem of how something that has no meaning acquires meaning—not the vague, general emotional sensations that music arouses, but precise, specific references to things. The proposed solution wasn’t designed to support the singing-ape hypothesis, but it has been eagerly grabbed by musilanguage advocates like Mithen because, just like a life preserver, it does seem to keep them afloat in a very choppy sea. Let’s see how and why this solution was proposed.
THE APPEAL OF HOLISTICS
Note first that the problem of how to get from songs without words to some kind of protolanguage is an artificial problem. By an artificial problem I mean a problem you don’t need to have, that you create for yourself—one you could avoid by simply abandoning the singing-ape hypothesis. And it’s a problem you can only solve by proposing a kind of protolanguage quite different from what most people had previously envisioned.
When I first developed the idea of protolanguage, I saw it as being something like a modern pidgin, a pidgin in its earliest stages. It would have consisted of a small number of wordlike things—whether signed or spoken is of no importance; there would quite likely have been both kinds—strung together haphazardly, if at all, without anything you could call grammatical structure, and eked out by pointing, pantomime, and any other device that might come to hand or to mouth.
These wordlike things wouldn’t have been like today’s words in their form, of course. For one thing, every one of today’s words, in whatever language, is composed of from one to a dozen or so highly specialized but in themselves meaningless speech sounds, each drawn from an inventory of from eleven to a hundred-odd sounds, depending on which language you’re speaking. In contrast to this, the words of protolanguage, even if vocal, could not have been divided into component parts, and would likely sound to us like meaningless grunts or squawks. But, like today’s words, each would have a fairly well-defined range of meaning, and that meaning, rather than relating directly to the current situation, would refer to some relatively stable class of objects or events, regardless of whether or not these were present at the scene.
By adopting this notion of protolanguage, you reduce the basic problems of explaining how language evolved to just two—explaining how words evolved and explaining how grammatical structure, syntax, evolved. You don’t have the additional problem of explaining a further intermediate stage between nonlanguage and language—in this case, singing. And the singing-ape hypothesis did nothing toward resolving the other problems, the origins of words and the origins of syntax. Even if you could explain how singing evolved, you still had to explain these two.
The idea of the kind of protolanguage I proposed was widely accepted through the 1990s, even though it didn’t help to explain where words came from in the first place. And then Alison Wray, of the University of Cardiff in Wales, came along with another highly appealing idea.
Wray did not subscribe to the singing-ape hypothesis, but what she proposed looked like it might solve the transition problem for singing-ape supporters. It also seemed to offer assistance to computational modelers of evolution and to strong continuists who believed in a straight line of development from ape ACS to language, so it had plenty of traction from the get-go. Here is what she proposed.
As we’ve seen, ACS units are holistic. They correspond in meaning not to words, but to whole phrases or sentences: “Stay off my territory!,” “Mate with me!,” or “Look out, predator coming!” They are holistic in the sense that there are no parts of whatever corresponds to “Mate with me!” that mean, individually, either “mate” or “me.” Moreover, the number of these units is more or less fixed, for any given species, and they are not learned, but (in some sense) innately programmed.
Suppose that, in some species ancestral to humans, the latter two restrictions were lifted. Suppose that this species could add indefinitely to its ACS repertoire—could invent and learn a whole series of holistic units that would similarly serve to manipulate the behavior of other group members (remember the consensus born in the late 1990s that a growing sophistication of social intelligence was the main driving force behind language). Eventually, the inventory of holistic signals would get so big that it would impose too heavy a load on memory. Fortunately, before the strain became intolerable, another development would have taken place.
Suppose that the holistic signals were phonetically complex—in other words that they consisted of a number of segments that could be distinguished from one another. Two of Wray’s examples are hypothetical holistic calls, “tebima,” meaning “Give that to her,” and “matapi,” meaning “Share this with her.” (Why anyone would develop two holistic calls quite different in structure that would largely overlap in meaning is one of the things about a holistic protolanguage that remain unexplained.) These calls happen by pure chance to share a syllable, ma.
The sharing is, of course, coincidental; there is nothing about ma, or any of the other segments of these calls, that yields any kind of meaning in itself. In Wray’s own words, “The whole thing means the whole thing.” However, the double coincidence—that the syllable occurred in both calls, and that both calls contained reference to an unspecified female recipient, or potential recipient—would be picked up by some smart hominids. They would then begin to use ma as a signal for “female recipient,” joining it with other fragments similarly gleaned, to start building a stock of words. And that is how language took on the compositional structure—isolated words having to be put together to form sentences—that we know and use today. Instead of starting with words and building them into sentences, you started with sentences (or rather the semantic equivalent of sentences) and broke them down into words.
This is a highly ingenious proposal, particularly in the way it seems to offer a viable bridge between ACSs and language. Unfortunately it raises a host of objections, only a few of which—hopefully the most serious—we’ll have space for here.
WHAT’S WRONG WITH A HOLISTIC PROTOLANGUAGE
Let’s grant the wholly unsupported assumption that these holistic calls would have been structured in such a way that you could actually divide them into segments—that you could say, “Here’s a subunit
within the unit that starts here and stops there.” In fact I doubt there’s any existing animal call that you could do this with, or if you could, that you’d find anyone to agree with you on where the boundaries lay. But, for the sake of argument, let’s suppose this was possible.
First of all, there’d be the problem that while ma might occur in two holistic calls both involving female recipients, it would also occur in other calls that had nothing to do with female recipients. Fine, say the holistic defenders; the other cases don’t matter, once you’ve spotted a sound/meaning coincidence, you stick to that and just ignore all the other calls in which ma occurs—ma now means “female recipient” and that’s that. Well, let’s be generous and grant that too.
The real problem is, as I said to Steve Mithen at a recent meeting, that in order to extract segments from a holistic signal, you’d first have to know English. Steve got very upset at this; he thought I was being facetious. Provocative, maybe, that’s my style, but facetious—no way.
You see, the whole holistic proposal depends on the assumption that for every holistic signal, there’s just one, and only one, equivalence between a holistic signal and some sentence in English (or another human language). If there isn’t, how can you possibly agree on what all the bits of the signal really refer to? Take the vervet alarm call for eagles that we looked at in chapter 2. As we saw, this could equally well be translated in at least three different ways: “Look out, an eagle is coming!”; “Danger from the air!”; “Quick, find the nearest bush and hide in it!” Suppose that the call could indeed be broken down into two or more segments. Unless our holistic hominids somehow already knew the equivalent for the call in English (or some other human language), how could they assign an unambiguous reference to those segments? Would the segments be taken to mean “eagle” and “coming,” or “danger” and “air,” or “bush” and “hide”?