Supercomputer-assisted algorithm discovers previously unknown high-resolution species diversification

How many other lives are in the world, with us? Biodiversity is a major element of the ecosystem. Every being plays its role in keeping the whole system running. You might not have thought about it much. So let the extinctions and biodiversification events happened in the history tell the story.
Supercomputer-assisted algorithm discovers previously unknown high-resolution species diversification

Many creatures that once lived on the Earth have left fossil records from which people can trace back to the ancient biodiversity. Paleontologists got wonderful outcomes with the fossil data they collected in the field, in the library or online. The classic diversity curve in geological time was made by Prof. J. John Sepkoski, and from it he and his colleagues recognized three evolutionary faunas as well as five mass extinction events. With these results not only the geologists but also the public started to get a sense of what the old world was like millions of years ago, instead of a few isolated scenarios with trilobites crawling and dinosaurs roaring sometime later. Over twenty years after the Sepkoski curve was published, another generic diversity curve based on a much larger fossil dataset collected through an online database was published by Prof. John Alroy and his colleagues. With different taxa counting algorithms, this curve displayed a different trend from the former, which left paleontologists wondering. 

Marine generic biodiversity curves of Sepkoski (1997) and Alroy et al. (2008) 

Cm, Cambrian; O, Ordovician; S, Silurian; D, Devonian; C, Carboniferous; P, Permian; Tr, Triassic; J, Jurassic; K, Cretaceous; T, Tertiary; Pg, Paleogene; Ng, Neogene

Species are the basic unit for evolution to happen but species data are too voluminous to be managed. As a result, a species diversity curve over a long period such as an era had never been assembled. A Chinese team leading by Prof. Junxuan Fan and Prof. Shuzhong Shen were curious about it, especially the geological life history in China. The geological sediments of the Paleozoic Era are extensive throughout China and show great variety. Numerous fossils and thousands of geological sections have been recorded by paleontologists since the 1880s. Assembling and analyzing these data would be a great advance for species-diversification research. 

Junxuan Fan and Shuzhong Shen wanted the investigation, as well as the results, to be as comprehensive and detailed as possible. High-quality fossil data needed to be included and, at the same time, an appropriate algorithm is needed to match the data from thousands of sections accurately. Fan focused on quantitative stratigraphy and graphic correlation when compiling hundreds of sections during his PhD research. An advanced stratigraphic correlation algorithm named as constrained optimization (CONOP) had been employed previously by Shen and Prof. Peter Sadler (co-author of the CONOP procedure) to analyze data from the end-Permian extinction event. They soon set up a team to digitize those Chinese sections and fossil data by hand and started to develop the CONOP with Sadler. 

Over the next ten years, over 500,000 fossil records were accumulated from the scientific literatures in Chinese and other languages and all were translated into English. This is also the time I became involved, as a fusulinid foraminifera expert to check the quality of those data, alongside a team of taxonomists. Different CONOP routines have been tried in the analysis of such an enormous dataset but, disappointingly, even for a 10,000 fossil species subset, calculation on a normal PC would take c. 17 years. No-one wants to wait 17 years for a result! Supercomputers caught the team’s attention but that would require development of a parallel-processing version of CONOP. None of the team had any previous experience with supercomputer. It took another two years but they finally succeeded. CONOP.SAGA, the most advanced, high-performance version of the CONOP procedure, was developed by Fan’s group, Xudong Hou, Jiao Yang, and Fan himself. The final analyses were run on the Tianhe II supercomputer and a single calculation took a much more reasonable time of 2-3 days.

 The species diversity curve we presented in our ground-breaking paper in 2020 (see link below) was based on 260,000 fossil records of 11,268 marine species mostly from China, and was calculated by the supercomputer-assisted constrained optimization algorithm. For the first time ever, the species diversity change over 300 million years has been described. More importantly, an average time resolution of 26,000 years is an unprecedented level of accuracy for the Paleozoic Era of Earth’s history. Our results suggest several significant biodiversity changes occurred through the Paleozoic: The Great Ordovician Biodiversification Event might have happened more abruptly than was thought previously; an unnoticed extraordinary phenomenon, the Carboniferous-Permian Biodiversification Event, is now recognized and there was little trace of the long-disputed end-Guadalupian extinction event.

Marine species diversity curve of China in the Paleozoic Era (modified after Fan et al., 2020)

These biodiversification events could be regional, so a global pattern will be the next exciting target in the pursuit of biotic history. A multidisciplinary, international, big science program — ‘Deep-time Digital Earth’ — has been launched on the back of this research effort in the past year. A comprehensive paleontological and stratigraphic online big data platform will be opened to the global scientific community soon. Fossil data all over the world will be accommodated, cross-checked by experts, and then be ready for multiple uses. With the handy algorithm and supercomputer fully equipped, the paleontological community will be able to contribute a global biodiversity curve with confidence. 

FAN, J., SHEN, S., ERWIN, D. H., SADLER, P. M., MACLEOD, N., CHENG, Q., HOU, X., YANG, J., WANG, X., WANG, Y., ZHANG, H., CHEN, X., LI, G., ZHANG, Y., SHI, Y., YUAN, D., CHEN, Q., ZHANG, L., LI, C. and ZHAO, Y. 2020. A high-resolution summary of Cambrian to Early Triassic marine invertebrate biodiversity. Science.


Alroy, J., Aberhan, M., Bottjer, D.J., Foote, M., Fürsich, F.T., Harries, P.J., Hendy, A.J., Holland, S.M., Ivany, L.C., Kiessling, W. and Kosnik, M.A., 2008. Phanerozoic trends in the global diversity of marine invertebrates. Science, 321(5885), p.97-100.

Sepkoski, J.J., 1997. Biodiversity: past, present, and future. Journal of Paleontology, 71(4), p.533-539.

Please sign in or register for FREE

If you are a registered user on Ecology & Evolution Community , please sign in