New dangers? Computers uncover 100,000 novel viruses in old genetic data

Source: Science Magazine

It took just one virus to cripple the world’s economy and kill millions of people; yet virologists estimate that trillions of still-unknown viruses exist, many of which might be lethal or have the potential to spark the next pandemic. Now, they have a new—and very long—list of possible suspects to interrogate. By sifting through unprecedented amounts of existing genomic data, scientists have uncovered more than 100,000 novel viruses, including nine coronaviruses and more than 300 related to the hepatitis Delta virus, which can cause liver failure.  

“It’s a foundational piece of work,” says J. Rodney Brister, a bioinformatician at the National Center for Biotechnology Information’s National Library of Medicine who was not involved in the new study. The work expands the number of known viruses that use RNA instead of DNA for their genes by an order of magnitude. It also “demonstrates our outrageous lack of knowledge about this group of organisms,” says disease ecologist Peter Daszak, president of the EcoHealth Alliance, a nonprofit research group in New York City that is raising money to launch a global survey of viruses. The work will also help launch so-called petabyte genomics—the analyses of previously unfathomable quantities of DNA and RNA data. (One petabyte is 1015 bytes.)

That wasn’t exactly what computational biologist Artem Babaian had in mind when he was in between jobs in early 2020. Instead, he was simply curious about how many coronaviruses—aside from the virus that had just launched the COVID-19 pandemic—could be found in sequences in existing genomic databases.