The origin and expansion of the world's largest hunter-gatherer language family

How and why did one language family – Pama-Nyungan – come to dominate most of the Australian continent? We combine a newly available Pama-Nyungan vocabulary database with modelling tools initially developed to study virus outbreaks to reconstruct the expansion of Pama-Nyungan languages and trace back to the origin of the family.

Like Comment

Our paper published in Nature Ecology and Evolution can be found here

Anyone who studies Australian linguistic diversity encounters a puzzle. Of the 28 recognised language families in mainland Australia, 27 of them are in the far north of the continent. One language family, Pama-Nyungan, contains more than 75% of Australia’s roughly 400 languages and occupies about 90% of the Australian mainland. How did this come to be? The puzzle is made all the more mystifying because Pama-Nyungan speakers, like all Indigenous Australians, were traditionally hunter-gatherers. This means that standard explanations – that new agricultural technology drives language expansion – cannot apply in the Australian case.  

One attempt to explain this enigma proposes that Pama-Nyungan speakers spread 50,000 years ago, with the initial colonisation of Australia from the north. However, languages generally change too quickly to preserve family resemblances over such a long-time scale. Others have proposed that improving climate 10,000 to 13,000 years ago or 7,000 to 10,000 years ago allowed expansion from a few refuge areas into sparsely populated territory. Another proposal, the ‘rapid replacement hypothesis’, argues for a much more recent expansion into already occupied territory just 4,000 to 6,000 years ago from near the Gulf of Carpentaria. 

To test between these scenarios, we adapted a set of modelling tools initially developed by computational biologists for studying virus outbreaks. Biologists are able to make inferences about when and where a virus outbreak originated by sequencing virus DNA from samples collected around the world and using it to reconstruct the process of descent with modification that connects the sampled strains. Knowing the location of the samples and the family tree of descent connecting them, it is possible to trace back in time along the branches of the tree to infer where the outbreak originated and roughly how long ago it occurred.

Rather than using DNA to build virus family trees, we used ‘cognates’ to build a family tree of 306 Pama-Nyungan languages. Cognates are words shared between modern languages that go back to a common ancestor language. For example, the Karree and Dhurak languages share a cognate word for boomerang (boomarring and bumarangga, respectively) that suggests they inherited it from a common ancestor – a bit like we inherit our DNA from our parents. The Duungidjawu and Durubul languages, by comparison, have different forms (baran and barrakadan, respectively) that are themselves cognate, suggesting these languages form a separate group. While any one cognate is of limited use, by considering many cognates and explicitly modelling how cognates change over time, we were able to determine the likely genealogical relationships between the languages in our sample. You can see the vocabulary data we used in the Chirila database

We combined our Pama-Nyungan family tree with information on the geographic range of each contemporary language and a simple model of language expansion. The expansion model assumes that when a language separates into two daughter languages, one of those daughters migrates to new territory and one remains. We also added time to the model by using a well-attested language divergence event in the family to calibrate the rates of cognate evolution and geographic movement. This allowed us to trace back in time to infer the most likely age and homeland of the Pama-Nyungan languages.  

What we found was clear support for a Pama-Nyungan origin ~ 5,700 years ago in an area south of the Gulf of Carpentaria, just as predicted under the rapid replacement hypothesis. This finding was robust across a range of different models of migration and cognate evolution, and held even after simulating errors in the cognate data. This suggests the findings were not contingent on specific modelling assumptions or cognate judgments. 

The expansion of such a large hunter-gatherer family represents a rich potential source of information about processes of cultural evolution, human migration and expansion prior to the agricultural revolution. Support for the rapid replacement hypothesis reinforces Pama-Nyungan’s status as a clear example of large-scale hunter gatherer language replacement and raises further questions about what might drive competition between hunter-gatherer cultural groups in the absence of agriculture. A number of fascinating proposals have been put forward, including innovative stone tools, food processing technologies and new ceremonial and marriage practices, although as yet the precise mechanisms at work remain a mystery.  

Our findings can also say something about the dynamics of hunter-gatherer migration. One variant of our migration model allowed rates of migration to differ depending on whether groups were near or far from water (i.e. the coast or Murray-Darling river system). Previous genetic work had shown that rates of gene flow were higher in these areas, possibly giving rise to faster rates of language migration. Alternatively, hunter-gatherers are known to range further in arid areas, where obtaining resources can be more difficult. Our models indicate rates of migration were in fact at least 2 times slower near water. This supports the role of environmental constraints as a driver of migration and also entails an apparent mismatch with genetic data.  

How could proximity to water increase rates of gene flow but decrease rates of language migration? One way to reconcile these results is that whilst water facilitates the movement of people between populations (higher gene flow) by increasing mobility, it reduces the need for the populations themselves to migrate (lower migration rates) because resources tend to be more abundant. If true, this has potentially important implications for models of gene-culture co-evolution. Compared to humans living in coastal and riverine environments, those living in arid environments would experience less gene flow but higher rates of population movement and would therefore be more likely to encounter genetically distinct neighbours.  

Finally, by explicitly modelling the history of the Pama-Nyungan family in time and space, we are now better placed to compare linguistic evidence with evidence from genetics and archaeology. Intriguingly, whilst the archaeological evidence also points to a significant period of change from ~6,000 years ago, analysis of new Australian genomic data has found little evidence to suggest a large-scale population expansion at this time. This indicates that whatever process allowed the Pama-Nyungan languages to spread so successfully may not have involved the movement of large numbers of people. Instead, languages may have spread as part of a cultural package of new ideas and technologies. Whilst the complete replacement of existing linguistic diversity without the movement of people may seem implausible, the spread of English with western technologies and institutions represents a striking potential modern analogue.

In addition to the original article in Nature Ecology & Evolution, you can find an accessible explanation of the data and methods on our website here.

The Chirila database, from which we draw our vocabulary data is available here.

The approach we applied to trace the origins of Pama-Nyungan is similar in principle to our previous work tracing the origins of the Indo-European language family.

Poster image: Count the dots, Australian Aborigine art by Kevin Lau.

Quentin Atkinson

Professor, University of Auckland