Underneath the thick forest canopy on a remote island in the South Pacific, a New Caledonian Crow peers from its perch, dark eyes glittering. The bird carefully removes a branch, strips off unwanted leaves with its bill and fashions a hook from the wood. The crow is a perfectionist: if it makes an error, it will scrap the whole thing and start over. When it’s satisfied, the bird pokes the finished utensil into a crevice in the tree and fishes out a wriggling grub.
The New Caledonian Crow is one of the only birds known to manufacture tools, a skill once thought to be unique to humans. Christian Rutz, a behavioral ecologist at the University of St Andrews in Scotland, has spent much of his career studying the crow’s capabilities. The remarkable ingenuity Rutz observed changed his understanding of what birds can do. He started wondering if there might be other overlooked animal capacities. The crows live in complex social groups and may pass toolmaking techniques on to their offspring. Experiments have also shown that different crow groups around the island have distinct vocalizations. Rutz wanted to know whether these dialects could help explain cultural differences in toolmaking among the groups.
New technology powered by artificial intelligence is poised to provide exactly these kinds of insights. Whether animals communicate with one another in terms we might be able to understand is a question of enduring fascination. Although people in many Indigenous cultures have long believed that animals can intentionally communicate, Western scientists traditionally have shied away from research that blurs the lines between humans and other animals for fear of being accused of anthropomorphism. But with recent breakthroughs in AI, “people realize that we are on the brink of fairly major advances in regard to understanding animals’ communicative behavior,” Rutz says.
Beyond creating chatbots that woo people and producing art that wins fine-arts competitions, machine learning may soon make it possible to decipher things like crow calls, says Aza Raskin, one of the founders of the nonprofit Earth Species Project. Its team of artificial-intelligence scientists, biologists and conservation experts is collecting a wide range of data from a variety of species and building machine-learning models to analyze them. Other groups such as the Project Cetacean Translation Initiative (CETI) are focusing on trying to understand a particular species, in this case the sperm whale.
Decoding animal vocalizations could aid conservation and welfare efforts. It could also have a startling impact on us. Raskin compares the coming revolution to the invention of the telescope. “We looked out at the universe and discovered that Earth was not the center,” he says. The power of AI to reshape our understanding of animals, he thinks, will have a similar effect. “These tools are going to change the way that we see ourselves in relation to everything.”
When Shane Gero got off his research vessel in Dominica after a recent day of fieldwork, he was excited. The sperm whales that he studies have complex social groups, and on this day one familiar young male had returned to his family, providing Gero and his colleagues with an opportunity to record the group’s vocalizations as they reunited.
For nearly 20 years Gero, a scientist in residence at Carleton University in Ottawa, kept detailed records of two clans of sperm whales in the turquoise waters of the Caribbean, capturing their clicking vocalizations and what the animals were doing when they made them. He found that the whales seemed to use specific patterns of sound, called codas, to identify one another. They learn these codas much the way toddlers learn words and names, by repeating sounds the adults around them make.
Having decoded a few of these codas manually, Gero and his colleagues began to wonder whether they could use AI to speed up the translation. As a proof of concept, the team fed some of Gero’s recordings to a neural network, an algorithm that learns skills by analyzing data. It was able to correctly identify a small subset of individual whales from the codas 99 percent of the time. Next the team set an ambitious new goal: listen to large swathes of the ocean in the hopes of training a computer to learn to speak whale. Project CETI, for which Gero serves as lead biologist, plans to deploy an underwater microphone attached to a buoy to record the vocalizations of Dominica’s resident whales around the clock.
As sensors have gotten cheaper and technologies such as hydrophones, biologgers and drones have improved, the amount of animal data has exploded. There’s suddenly far too much for biologists to sift through efficiently by hand. AI thrives on vast quantities of information, though. Large language models such as ChatGPT must ingest massive amounts of text to learn how to respond to prompts: ChatGPT-3 was trained on around 45 terabytes of text data, a good chunk of the entire Library of Congress. Early models required humans to classify much of those data with labels. In other words, people had to teach the machines what was important. But the next generation of models learned how to “self-supervise,” automatically learning what’s essential and independently creating an algorithm of how to predict what words come next in a sequence.
In 2017 two research groups discovered a way to translate between human languages without the need for a Rosetta stone. The discovery hinged on turning the semantic relations between words into geometric ones. Machine-learning models are now able to translate between unknown human languages by aligning their shapes—using the frequency with which words such as “mother” and “daughter” appear near each other, for example, to accurately predict what comes next. “There’s this hidden underlying structure that seems to unite us all,” Raskin says. “The door has been opened to using machine learning to decode languages that we don’t already know how to decode.”
The field hit another milestone in 2020, when natural-language processing began to be able to “treat everything as a language,” Raskin explains. Take, for example, DALL-E 2, one of the AI systems that can generate realistic images based on verbal descriptions. It maps the shapes that represent text to the shapes that represent images with remarkable accuracy—exactly the kind of “multimodal” analysis the translation of animal communication will probably require.
Many animals use different modes of communication simultaneously, just as humans use body language and gestures while talking. Any actions made immediately before, during, or after uttering sounds could provide important context for understanding what an animal is trying to convey. Traditionally, researchers have cataloged these behaviors in a list known as an ethogram. With the right training, machine-learning models could help parse these behaviors and perhaps discover novel patterns in the data. Scientists writing in the journal Nature Communications last year, for example, reported that a model found previously unrecognized differences in Zebra Finch songs that females pay attention to when choosing mates. Females prefer partners that sing like the birds the females grew up with.
You can already use one kind of AI-powered analysis with Merlin, a free app from the Cornell Lab of Ornithology that identifies bird species. To identify a bird by sound, Merlin takes a user’s recording and converts it into a spectrogram—a visualization of the volume, pitch and length of the bird’s call. The model is trained on Cornell’s audio library, against which it compares the user’s recording to predict the species identification. It then compares this guess to eBird, Cornell’s global database of observations, to make sure it’s a species that one would expect to find in the user’s location. Merlin can identify calls from more than 1,000 bird species with remarkable accuracy.
But the world is loud, and singling out the tune of one bird or whale from the cacophony is difficult. The challenge of isolating and recognizing individual speakers, known as the cocktail party problem, has long plagued efforts to process animal vocalizations. In 2021 the Earth Species Project built a neural network that can separate overlapping animal sounds into individual tracks and filter background noise, such as car honks—and it released the open-source code for free. It works by creating a visual representation of the sound, which the neural network uses to determine which pixel is produced by which speaker. In addition, the Earth Species Project recently developed a so-called foundational model that can automatically detect and classify patterns in datasets.
Not only are these tools transforming research, but they also have practical value. If scientists can translate animal sounds, they may be able to help imperiled species. The Hawaiian Crow, known locally as the ‘Alalā, went extinct in the wild in the early 2000s. The last birds were brought into captivity to start a conservation breeding program. Expanding on his work with the New Caledonian Crow, Rutz is now collaborating with the Earth Species Project to study the Hawaiian Crow’s vocabulary. “This species has been removed from its natural environment for a very long time,” he says. He is developing an inventory of all the calls the captive birds currently use. He’ll compare that to historical recordings of the last wild Hawaiian Crows to determine whether their repertoire has changed in captivity. He wants to know whether they may have lost important calls, such as those pertaining to predators or courtship, which could help explain why reintroducing the crow to the wild has proved so difficult.
Machine-learning models could someday help us figure out our pets, too. For a long time animal behaviorists didn’t pay much attention to domestic pets, says Con Slobodchikoff, author of Chasing Doctor Dolittle: Learning the Language of Animals. When he began his career studying prairie dogs, he quickly gained an appreciation for their sophisticated calls, which can describe the size and shape of predators. That experience helped to inform his later work as a behavioral consultant for misbehaving dogs. He found that many of his clients completely misunderstood what their dog was trying to convey. When our pets try to communicate with us, they often use multimodal signals, such as a bark combined with a body posture. Yet “we are so fixated on sound being the only valid element of communication, that we miss many of the other cues,” he says.
Now Slobodchikoff is developing an AI model aimed at translating a dog’s facial expressions and barks for its owner. He has no doubt that as researchers expand their studies to domestic animals, machine-learning advances will reveal surprising capabilities in pets. “Animals have thoughts, hopes, maybe dreams of their own,” he says.
Farmed animals could also benefit from such depth of understanding. Elodie F. Briefer, an associate professor in animal behavior at the University of Copenhagen, has shown that it’s possible to assess animals’ emotional states based on their vocalizations. She recently created an algorithm trained on thousands of pig sounds that uses machine learning to predict whether the animals were experiencing a positive or negative emotion. Briefer says a better grasp of how animals experience feelings could spur efforts to improve their welfare.
But as good as language models are at finding patterns, they aren’t actually deciphering meaning—and they definitely aren’t always right. Even AI experts often don’t understand how algorithms arrive at their conclusions, making them harder to validate. Benjamin Hoffman, who helped to develop the Merlin app before joining the Earth Species Project, says that one of the biggest challenges scientists now face is figuring out how to learn from what these models discover.
“The choices made on the machine-learning side affect what kinds of scientific questions we can ask,” Hoffman says. Merlin Sound ID, he explains, can help detect which birds are present, which is useful for ecological research. It can’t, however, help answer questions about behavior, such as what types of calls an individual bird makes when it interacts with a potential mate. In trying to interpret different kinds of animal communication, Hoffman says researchers must also “understand what the computer is doing when it’s learning how to do that.”
Daniela Rus, director of the Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory, leans back in an armchair in her office, surrounded by books and stacks of papers. She is eager to explore the new possibilities for studying animal communication that machine learning has opened up. Rus previously designed remote-controlled robots to collect data for whale-behavior research in collaboration with biologist Roger Payne, whose recordings of humpback whale songs in the 1970s helped to popularize the Save the Whales movement. Now Rus is bringing her programming experience to Project CETI. Sensors for underwater monitoring have rapidly advanced, providing the equipment necessary to capture animal sounds and behavior. And AI models capable of analyzing those data have improved dramatically. But until recently, the two disciplines hadn’t been joined.
At Project CETI, Rus’s first task was to isolate sperm whale clicks from the background noise of the ocean realm. Sperm whales’ vocalizations were long compared to binary code in the way that they represent information. But they are more sophisticated than that. After she developed accurate acoustic measurements, Rus used machine learning to analyze how these clicks combine into codas, looking for patterns and sequences. “Once you have this basic ability,” she says, “then we can start studying what are some of the foundational components of the language.” The team will tackle that question directly, Rus says, “analyzing whether the [sperm whale] lexicon has the properties of language or not.”
But grasping the structure of a language is not a prerequisite to speaking it—not anymore, anyway. It’s now possible for AI to take three seconds of human speech and then hold forth at length with its same patterns and intonations in an exact mimicry. In the next year or two, Raskin predicts, “we’ll be able to build this for animal communication.” The Earth Species Project is already developing AI models that emulate a variety of species, with the aim of having “conversations” with animals. He says two-way communication will make it that much easier for researchers to infer the meaning of animal vocalizations.
In collaboration with outside biologists, the Earth Species Project plans to test playback experiments, playing an artificially generated call to Zebra Finches in a laboratory setting and then observing how the birds respond. Soon “we’ll be able to pass the finch, crow or whale Turing test,” Raskin asserts, referring to the point at which the animals won’t be able to tell they are conversing with a machine rather than one of their own. “The plot twist is that we will be able to communicate before we understand.”
The prospect of this achievement raises ethical concerns. Karen Bakker, a digital innovations researcher and author of The Sounds of Life: How Digital Technology Is Bringing Us Closer to the Worlds of Animals and Plants, explains that there may be unintended ramifications. Commercial industries could use AI for precision fishing by listening for schools of target species or their predators; poachers could deploy these techniques to locate endangered animals and impersonate their calls to lure them closer. For animals such as humpback whales, whose mysterious songs can spread across oceans with remarkable speed, the creation of a synthetic song could, Bakker says, “inject a viral meme into the world’s population” with unknown social consequences.
So far the organizations at the leading edge of this animal-communication work are nonprofits like the Earth Species Project that are committed to open-source sharing of data and models and staffed by enthusiastic scientists driven by their passion for the animals they study. But the field might not stay that way—profit-driven players could misuse this technology. In a recent article in Science, Rutz and his co-authors noted that “best-practice guidelines and appropriate legislative frameworks” are urgently needed. “It’s not enough to make the technology,” Raskin warns. “Every time you invent a technology, you also invent a responsibility.”
Designing a “whale chatbot,” as Project CETI aspires to do, isn’t as simple as figuring out how to replicate sperm whales’ clicks and whistles; it also demands that we imagine an animal’s experience. Despite major physical differences, humans actually share many basic forms of communication with other animals. Consider the interactions between parents and offspring. The cries of mammalian infants, for example, can be incredibly similar, to the point that white-tailed deer will respond to whimpers whether they’re made by marmots, humans or seals. Vocal expression in different species can develop similarly, too. Like human babies, harbor seal pups learn to change their pitch to target a parent’s eardrums. And both baby songbirds and human toddlers engage in babbling—a “complex sequence of syllables learned from a tutor,” explains Johnathan Fritz, a research scientist at the University of Maryland’s Brain and Behavior Initiative.
Whether animal utterances are comparable to human language in terms of what they convey remains a matter of profound disagreement, however. “Some would assert that language is essentially defined in terms that make humans the only animal capable of language,” Bakker says, with rules for grammar and syntax. Skeptics worry that treating animal communication as language, or attempting to translate it, may distort its meaning.
Raskin shrugs off these concerns. He doubts animals are saying “pass me the banana,” but he suspects we will discover some basis for communication in common experiences. “It wouldn’t surprise me if we discovered [expressions for] ‘grief’ or ‘mother’ or ‘hungry’ across species,” he says. After all, the fossil record shows that creatures such as whales have been vocalizing for tens of millions of years. “For something to survive a long time, it has to encode something very deep and very true.”
Ultimately real translation may require not just new tools but the ability to see past our own biases and expectations. Last year, as the crusts of snow retreated behind my house, a pair of Sandhill Cranes began to stalk the brambles. A courtship progressed, the male solicitous and preening. Soon every morning one bird flapped off alone to forage while the other stayed behind to tend their eggs. We fell into a routine, the birds and I: as the sun crested the hill, I kept one eye toward the windows, counting the days as I imagined cells dividing, new wings forming in the warm, amniotic dark.
Then one morning it ended. Somewhere behind the house the birds began to wail, twining their voices into a piercing cry until suddenly I saw them both running down the hill into the stutter start of flight. They circled once and then disappeared. I waited for days, but I never saw them again.
Wondering if they were mourning a failed nest or whether I was reading too much into their behavior, I reached out to George Happ and Christy Yuncker, retired scientists who for two decades shared their pond in Alaska with a pair of wild Sandhill Cranes they nicknamed Millie and Roy. They assured me that they, too, had seen the birds react to death. After one of Millie and Roy’s colts died, Roy began picking up blades of grass and dropping them near his offspring’s body. That evening, as the sun slipped toward the horizon, the family began to dance. The surviving colt joined its parents as they wheeled and jumped, throwing their long necks back to the sky.
Happ knows critics might disapprove of their explaining the birds’ behaviors as grief, considering that “we cannot precisely specify the underlying physiological correlates.” But based on the researchers’ close observations of the crane couple over a decade, he writes, interpreting these striking reactions as devoid of emotion “flies in the face of the evidence.”
Everyone can eventually relate to the pain of losing a loved one. It’s a moment ripe for translation.
Perhaps the true value of any language is that it helps us relate to others and in so doing frees us from the confines of our own minds. Every spring, as the light swept back over Yuncker and Happ’s home, they waited for Millie and Roy to return. In 2017 they waited in vain. Other cranes vied for the territory. The two scientists missed watching the colts hatch and grow. But last summer a new crane pair built a nest. Before long, their colts peeped through the tall grass, begging for food and learning to dance. Life began a new cycle. “We’re always looking at nature,” Yuncker says, “when really, we’re part of it.”
What you can do
Support ‘Fighting for Wildlife’ by donating as little as $1 – It only takes a minute. Thank you.
Fighting for Wildlife supports approved wildlife conservation organizations, which spend at least 80 percent of the money they raise on actual fieldwork, rather than administration and fundraising. When making a donation you can designate for which type of initiative it should be used – wildlife, oceans, forests or climate.
This article by Lois Parshley was first published by Scientific American on October 1, 2023. Lead Image: The Project Cetacean Translation Initiative (CETI) is using machine learning to try to understand the vocalizations of sperm whales. Credit: Franco Banfi/Minden Pictures.