Bengaluru: Among the emails US’s top infectious diseases expert Dr Anthony Fauci sent and received during the early days of the Covid pandemic — contents of which have become a hotly discussed topic — a group of them discussed an early pre-print (not-peer reviewed) that was posted by researchers from the Kusuma School of Biological Sciences at IIT-Delhi in January 2020.
The scientific paper claimed that the SARS-CoV-2 virus had four “inserts” in its spike protein that were similar to the ones found in HIV, and concluded that these are unlikely to occur naturally in a coronavirus, suggesting an “unconventional evolution” of the virus that warrants further investigation.
The authors withdrew the paper within a week, and an updated version has not been posted since. The findings in the paper have subsequently been rebutted by other virologists. These include a reanalysis of the IIT-Delhi team’s data against a larger database of viruses, which went on to reveal that the spike protein insertions are found naturally in bat coronavirus.
”We are still getting numerous queries about why we withdrew the paper and when we are going to publish next,” said Prof. Bishwajit Kundu, the team lead, to ThePrint. “We have moved on to finding a solution to the pandemic rather than fuelling conspiracy theories and creating media hype. We have left that part for other researchers to establish with their top-level equipment, facilities, money, manpower and expertise.”
Fauci reportedly dismissed the findings as “outlandish”.
— Jess (@uacjess) June 1, 2021
What the paper claimed
The authors stated that they discovered four insertions in the spike glycoprotein of the SARS-CoV-2 virus (then called 2019-nCoV), which they say are not present in other coronaviruses.
An insertion is a form of mutation. Mutations occur when a virus replicates and its RNA sequence (the genomic code of the virus) undergoes random changes at some places, resulting in minor changes on its structure. RNA (and DNA) sequences are “read” in groups of three nucleotides or letters, called codons. Each codon codes for or results in the production of a particular amino acid, and multiple amino acids come together to make a protein.
Mutations can be changes within a codon, such as the E484K, where the glutamic acid (E) is replaced by lysine (K) at position 484 on the sequence, resulting in changes on the spike protein. There are also “frameshift” mutations, in the form of insertion or deletion of a nucleotide or letter, which results in all nucleotides after this mutation shifting in position in the sequence, resulting in a group of new codons which then code for different amino acids.
The IIT-Delhi paper identifies four of these insertions that code for the spike protein of the virus. Their paper states that these insertions are similar to the ones found in the HIV-1 virus. The authors claim that this is unlikely to be “fortuitous in nature”.
While the authors note that physically the inserts are “discontinuous” on the primary sequence of the virus — meaning that they are independent and not connected to each other physically — their 3D modelling suggested that these could converge and come into play when the virus makes contact with a cell and gains entry.
How the team arrived at their data
The team’s paper was uploaded to the open pre-print server biorXiv on 27 January 2020. At the time, concerns about a new pneumonia-causing virus were just gaining prominence, and cases had been reported in barely a handful of countries.
The first confirmed case of Covid-19 outside of China occurred in Thailand on 13 January 2020, and the disease was named Covid-19 by WHO on 11 February 2020.
The team used data from the viral genome database of the National Center for Biotechnology Information (NCBI) and GISAID to which all sequences are being uploaded today. At the time of analysis, there were a total of 55 general coronavirus sequences available to the team, including 28 full length sequences of the SARS-CoV-2 virus.
They discovered that the closest relative to the new virus was the SARS-1 virus, and that there were four new inserts in all sequences of the new virus. They compared these inserts against all virus genomes that were fully sequenced. They found that each of these inserts aligned with HIV-1 proteins. Three of them matched to a protein called the HIV-1 gp120, and one with another one called HIV-1 Gag.
The authors state that while one such match with a HIV viral protein is possible naturally, they think that four matches is improbable.
They also state that none of these four inserts are present in any other coronavirus, and the combination is unique to the SARS-CoV-2 virus.
Furthermore, they state that their analysis shows these inserts are present in a non-continuous manner in the primary protein sequence, but 3D modelling showed that when the virus actually tries to bind to a host cell, these inserts come together and converge at this site. The authors state that their findings show that the four inserts “may facilitate virus-host interactions” and the virus evolved in an unusual manner.
Rebuttals and subsequent findings
A day after the authors withdrew their paper, on 2 February 2020, the director of the Center for Emerging Infectious Diseases at the Wuhan Institute of Virology, Shi Zhengli, and her team, published the sequence of a bat coronavirus called RaTG13 in what was the very first paper to properly describe and characterise what would eventually become known as the SARS-CoV-2 virus.
Zhengli’s team’s paper found that the 2019-nCoV virus belonged to the same family as the SARS virus, and had a 96.2 per cent genome overlap with the most closely related RaTG13 bat coronavirus. RaTG13 was previously described by Zhengli in a 2016 paper and an addendum to the new paper stated that the partial sequence had been uploaded to the genome database GenBank in 2016.
Her team subsequently had fully sequenced the virus, and this was published in her February paper.
Zhengli, often referred to as the “bat woman of China”, was responsible for tracing the origin of the SARS-1 virus back to cave dwelling horseshoe bats, and is considered a global expert on bat coronaviruses.
In March of 2020, researchers from the University of Michigan-Ann Arbor published a paper where they reanalysed the data from IIT-Delhi against a larger database of multiple coronaviruses.
They concluded that all four insertions identified by the IIT-Delhi team are present in other viruses, including coronaviruses. The team confirmed that three of the four inserts had been found in the RaTG13 bat coronavirus sequence alone.
“The four insertions highlighted by Pradhan et al. in the spike protein are not unique to 2019-nCoV and HIV-1,” wrote the researchers. “In fact, the similarities in the sequence-based alignments built on these very short fragments are statistically insignificant,” they added, refuting the conclusions of the IIT-Delhi paper.
They too noted that the inserts are structurally far away from the site of entry into the cell for the spike protein, called the receptor binding domain or RBD.
RBD is located on its spike protein and allows the virus to latch onto our cells’ receptors to gain entry. The researchers state that the fact that these inserts are farther away from RBD does not support the hypothesis that they may have developed to help the virus infect humans. However, they state that of the four, the one that matched with HIV-1 Gag could become involved during the process of receptor binding.
In fact, at RBD, the amino acid residues, or what is left over after a protein is formed, seemed to be similar to what is found in pangolin coronaviruses, which is what led some researchers to conclude that the pangolin might have been a candidate intermediate host. However, subsequent findings in March 2020 refuted even this, stating that the residues were actually inherited by pangolin viruses from bat viruses, likely by recombining with another bat virus — a process quite common in nature, inside favourable animal hosts.
Why the paper was withdrawn
While subsequent research rebutted the findings of the paper, the authors withdrew their work before any of those papers were published. The team cites overwhelming media and scientific attention as the reason.
“The pre-print drew so much attention, and public narrative started to talk about biowarfare and conspiracy theories,” said Kundu. “We were puzzled. We haven’t been exposed to this level of attention before and were overwhelmed with calls and allegations. We had to revaluate our findings and withdrew the paper.”
The team then attempted to modify the language of the paper and publish it in a peer-reviewed journal (instead of re-uploading as a public preprint), but were unsuccessful.
“Now our work is shelved because all the primary findings like the furin cleavage site, the inserts, and the ACE binding regions have been published by others.”
But Kundu maintains the significance of their work, stating that one or two inserts could be found naturally but all four inserts are statistically unlikely to be found in nature.
“There could be something unusual in the duration of evolution and acquisition of the inserts. A possible laboratory contamination and leak cannot be ruled out,” he added.
(Edited by Manasa Mohan)
Why news media is in crisis & How you can fix it
India needs free, fair, non-hyphenated and questioning journalism even more as it faces multiple crises.
But the news media is in a crisis of its own. There have been brutal layoffs and pay-cuts. The best of journalism is shrinking, yielding to crude prime-time spectacle.
ThePrint has the finest young reporters, columnists and editors working for it. Sustaining journalism of this quality needs smart and thinking people like you to pay for it. Whether you live in India or overseas, you can do it here.