One of the longest-running questions about this pandemic is a simple one: where did it come from? How did a virus that had seemingly never infected a human before make a sudden appearance in our species, equipped with what it needed to sweep from China through the globe in a matter of months?
Analysis of the virus’ genome was ambiguous. Some analyses placed its origin within the local bat population. Others highlighted similarities to pangolins, which might have been brought to the area by the wildlife trade. Less evidence-based ideas included an escape from a research lab or a misplaced bioweapon. Now, a US-based research team has done a detailed analysis of a large collection of viral genomes, and it finds that evolution pieced together the virus from multiple parts—most from bats, but with a key contribution from pangolins.
How do pieces of virus from different species end up being mashed together? The underlying biology is a uniquely viral twist on a common biological process: recombination.
In cells, recombination is a normal part of genetics. Any time two DNA molecules share extensive similarities, it’s possible for them to exchange pieces. The result is a hybrid molecule: a stretch of DNA from one parental piece of DNA, followed by a stretch from the other. As a result, some of the differences between the two parent molecules get scrambled—some from each parent will end up on the final molecule.
Recombination is a normal part of the reproduction of complicated cells. If you happen to have an offspring, you’ve given that child a set of chromosomes that are a mix of pieces from the ones you were given by your mother and father. Recombination can also take place in simpler cells, where it’s been the primary tool that we’ve used to engineer new or altered genes into the genome of bacteria. And, since the molecules that perform the recombination aren’t especially picky about which DNA molecules they work with, DNA viruses that infect cells can sometimes recombine if more than one strain of virus infects a single cell.
Those of you who have followed the virus closely, however, may be wondering what’s going on here. All of this recombination takes place between DNA molecules. But the coronavirus genome is composed of RNA. So why would any of it work there?
The answer is that it doesn’t. But other processes essentially perform the same function, mixing up pieces of RNA to form distinct genetic combinations. For example, the influenza virus spreads its genome across eight different molecules, allowing cells infected by more than one strain of flu virus to produce viral particles that have a random assortment of molecules from the two strains.
Coronavirus’ genome is a single, long RNA molecule, so that sort of recombination doesn’t work there. But it still can recombine. The enzyme that copies the RNA genome moves down it from one end to the other, making a copy as it goes. Sometimes, however, it can stall and fall off the molecule it’s copying, while still hanging on to its partially complete copy. In many cases, the copying will just be aborted. But in others, it can latch on to a new genome and use the copy to pick up where it left off.
Critically, the new molecule with which it restarts the copying doesn’t have to be the one it was copying originally. It just has to be similar to the first one it copied—it doesn’t have to be identical. As a result, this process can allow recombination among viruses that are relatively distantly related from an evolutionary perspective. All they have to do is infect the same host.
Now that we know recombination can take place, how would we go about looking for it? The key here is that we now have a lot of coronavirus sequences from a lot of different hosts available in public databases. Dedicated public health researchers have even gone in and sampled dozens of bat sources to look for strains that might be capable of starting a pandemic. So, for the new analysis, the research team started with a collection of 43 different coronaviruses from a variety of species, including humans, bats, and the pangolin sequences known to be similar to SARS-CoV-2.
The basic genome analysis confirmed that SARS-CoV-2 is most closely related to a number of viruses that had been isolated from bats. But different areas of the virus were more or less related to different bat viruses. In other words, you’d see a long stretch of RNA that’s most similar to one virus from bats, but it would then switch suddenly to look most similar to a different bat virus.
This sort of pattern is exactly what you’d expect from recombination, where the switch between two different molecules would cause a sudden change in the sequence at the point where the exchange took place. (You’d see this rather than differences from both parent molecules being spread evenly throughout the genome.)
But there was a notable exception to this mixing of bat viruses: the spike protein that sits on the virus’ surface and latches on to human cells. Here, the researchers found exactly what the earlier studies had suggested: a key stretch of the spike protein, the one that determines which proteins on human cells it interacts with, came from a pangolin version of the virus through recombination.
In other words, both of the ideas from earlier work were right. SARS-CoV-2 is most closely related to bat viruses and most closely related to pangolin viruses. It just depends on where in the genome you look.
The other bit of information to come out of this study is an indication of where changes in the virus’ proteins are tolerated. This inability to tolerate changes in an area of the genome tends to be an indication that the protein encoded by that part of the genome has an essential function. The researchers identified a number of these, one of which is the part of the spike protein that came from the pangolin virus. Of all 6,400 of the SARS-CoV-2 genomes isolated during the pandemic, only eight from a single cluster of cases had any changes in this region. So, it’s looking likely that the pangolin sequence is essential for the virus’ ability to target humans.
There’s some good news in all of this: rumors about this being an escaped weapons experiment make little sense in terms of what the genome sequences tell us about biology. Less reassuring, however, is what the sequences tell us about the giant natural experiment that may be going on around us. And that tells us there appears to be a large number of coronaviruses that are regularly exchanging genetic information. And, while exchanges are more common among viruses that infect the same species, it’s entirely possible that contributions can come from much more distantly related ones.
The authors find evidence that the viruses from different species may experience distinct selective pressure, which isn’t really surprising. But that also can produce difficult-to-predict results when those viruses hop to a new species—and the difficulty will rise if they then exchange information with other viruses native to that species.
Summing this up, there seem to be myriad coronaviruses out there (including plenty we don’t know about), and some species are serving as labs in which new genetic combinations are created. And, right now, we only have a very partial window into the sort of potential out there in species that have frequent contacts with humans. And some research cited by the authors suggests that humans have been exposed to at least some of these viruses (based on antibodies to them)—fortunately without a major outbreak occurring.
All of which suggests that additional pandemics are a question of when, rather than if. But, of course, that had already been suggested in the aftermath of MERS and the original SARS, and the world as a whole did remarkably little to study the risk, work towards treatments, or plan for the pandemic’s arrival. We can only hope that the more obvious example of COVID-19 will change that.