Bengaluru: Alphanumeric codes given to about 27 human genes have been tweaked over the past one year because Microsoft Excel, a powerful tool to assess and plot complex data, would not stop confusing them for dates.
The changes were formalised as part of new guidelines issued earlier this month as scientists finally addressed a years-old problem that may seem innocuous at first but has the potential to corrupt research.
The 27 genes with revised names include the Membrane Associated Ring-CH-Type Finger 1, denoted by the symbol MARCH1, and Septin-2, which is better known as SEPT2. Under Excel auto-formatting, each of these would get converted to dates.
The problem has stalked researchers long enough that workarounds have been devised and apps reportedly made to tackle it. But the new nomenclature rules strike the problem at its root, and many researchers have taken to social media to express their joy.
Finally!!! Countless hours were spent on fixing these! https://t.co/9AL8j06MVT
— Mudra Hegde (@HegdeMudra) August 5, 2020
Scientists have been getting frustrated by this problem for years, and the body in charge of standardizing the names of genes, the HUGO Gene Nomenclature Committee, updated their guidelines this week to fix the problem: pic.twitter.com/7eTqToGGHx
— James Vincent (@jjvincent) August 6, 2020
This amazing, as a Bioinformatician, I don't have to educate the biologist anymore about this.
I would often get comments "Why don't you just convert column format from dates to general text?"
Duh! Like it will bring back the correct gene names. https://t.co/a6WAcEoAiR
— Huzaifa Hassan (@huzaifahassan) August 6, 2020
Also Read: Your risk of getting sick from coronavirus could lie in your genes
An evolving process
Excel is a commonly used data platform, but errors in format can corrupt biological data. Then there is the obvious frustration spelt by unwitting format changes.
Apart from dates, Excel has also been known to convert the names of some genes like ‘2310009E13’ to the floating-point format — in this case, to ‘2.31E+13’.
A 2016 study found that Excel had converted gene names to dates and floating-point numbers in approximately one-fifths of 3,597 published papers.
The naming convention for genes is overseen by the HUGO Gene Nomenclature Committee (HGNC), which currently holds a database of around 33,000 gene symbols and names that belong to over 1,300 gene families.
The new set of guidelines issued by the HGNC mandates that gene symbols be determined in such a way that their formatting does not affect data validation in Excel. To this end, MARCH1 is now MARCHF1 and SEPT1 is SEPTIN1.
In the past, gene names have also been changed for other reasons, HGNC coordinator Elspeth Bruford was quoted as saying in a report on The Verge. Names that can be confused with other words, for example, have been tweaked to avoid false positives in text searches — so, CARS became CARS1, and WARS, WARS1.
Rules have also been altered to tackle certain creative liberties that defined the process of gene-naming in the earlier days, as also to eliminate any prospect of offence. “Headcase homolog (Drosophila)” was thus changed to hdc homolog, cell cycle regulator, and “ARS” to ARS1.
Genes were historically given unique or funny symbols, such as ‘tinman’, a gene required for the heart that was named after the Wizard of Oz character who craved a heart, and ‘NEMO’ for NF-kappa-B essential modulator, ‘Indy’ for I’m Not Dead Yet’, and ‘Pokemon (now changed to Zbtb7)’ for POK erythroid myeloid ontogenic factor.
However, new symbols are strictly regulated by HGNC. They must contain only Latin letters and Arabic numerals with no sub- or superscript. They should not spell out names, especially offensive ones, in any language. Whimsical and funny are out, too.
Bruford said in The Verge piece that there has been some dissent among researchers over the change, with some questioning why Excel couldn’t do something to address their concerns. However, she said, the community affected by this problem is too small for Excel to effect change in a software that is used “extremely widely” by a “massive community”.
Also Read: New insights into genes that drive cancer: A study of 2,700 samples