All download files including the archive files are now in a publicly accessible Google Storage Bucket. Downloads page links have been updated.

Frequently asked questions

General

What is the HGNC?

The HUGO Gene Nomenclature Committee is the only worldwide authority that assigns standardised nomenclature to human genes. Please see the "About the HGNC" page for more information on the committee and our remit and history.

What is HGNC-approved nomenclature and why do we need it?

The HGNC approves both a short-form abbreviation known as a gene symbol, and also a longer and more descriptive name. Each symbol is unique and the committee ensures that each gene is only given one approved gene symbol. This allows for clear and unambiguous reference to genes in scientific communications, and facilitates electronic data retrieval from databases and publications. In preference, symbols also maintain parallel construction for different members of a gene family (see "What is a root symbol?") and can also be used for orthologous genes in other vertebrate species.

Where can I find information about existing human gene symbols?

You can search all approved human gene symbols using the HGNC search facility.

What is a root symbol?

A root (or stem) symbol is used as the basis for a series of approved symbols which are defined as members of either a functional or structural gene group. Root symbols are sometimes devised in consultation with scientists in the relevant field, e.g. (# denotes number in series) CYP#: cytochrome P450; HOX#: homeo box; DUSP#: dual specificity phosphatase; and MS4#: membrane spanning 4-domains.

Where can I find the Nomenclature Guidelines?

The current guidelines can be accessed here.

Do I have to use the approved symbols?

We try to encourage as many researchers as possible to contribute towards development of nomenclature systems in the hope that they will then be more likely to use them. We do realise that not everyone will consistently use approved symbols; but if they are at least mentioned in a publication, it will ensure that the symbol can be used as a search term. This then gives a reference point to facilitate data retrieval in a number of databases including PubMed, GenBank, OMIM, NCBI Gene and MGI. Some journals do have editorial policies that require the use of HGNC-approved symbols.

How should I cite HGNC nomenclature resources?

Authors are requested to cite:

Seal RL, Braschi B, Gray K, Jones TEM, Tweedie S, Haim-Vilmovsky L, Bruford EA. Genenames.org: the HGNC resources in 2023. Nucleic Acids Res. PMID: 36243972 DOI: 10.1093/nar/gkac888

To cite data within the database use the following format:

HGNC Database, HUGO Gene Nomenclature Committee (HGNC), Department of Haematology, Long Road, Cambridge CB2 0PT, United Kingdom www.genenames.org.

Please include the month and year you retrieved the data cited.

Are there nomenclature committees for other species?

Yes, we interact with other nomenclature committees and databases on a regular basis, particularly the Mouse Gene Nomenclature Committee (MGNC). Please see the following links:

*HGNC is also now funded to assign standardized gene names to genes in other vertebrate species that do not have an existing gene nomenclature authority. Please see the VGNC website for further information.

Does the HGNC collaborate with specialist nomenclature commmittees and advisors?

Yes, a table listing other nomenclature committees we collaborate with that work on specific groups of genes/proteins can be found here, and a page with our gene group specialist advisors is found here.

How should orthologs be identified?

Where clear orthology can be asserted between genes in different vertebrate species, best efforts are made via interaction with other nomenclature groups to ensure they are assigned the same symbol. Human orthologs of genes first identified in other species should not be designated by a symbol beginning with H (or h) for human.

Do alternative gene transcripts or splice variants have approved symbols?

The HGNC will not usually assign gene symbols to alternative transcripts or splice variants.

When should I use italics?

The HGNC endorses the use of italicised gene symbols when referring to genes, alleles and RNAs. This distinguishes them from proteins which can be referred to using the non-italicised gene symbol, e.g. the BRAF mRNA encodes the BRAF protein.

How should I refer to the protein encoded by a gene?

Ideally, protein names and symbols would be identical to those used for the gene. However, we are a gene nomenclature committee and do not have any guidelines pertaining to proteins or authority over protein nomenclature. There is a recommendation for the use of italics for gene symbols, and non-italicised letters for the encoded protein; but some journals have editorial policies that prevent this convention from being used, so it is not by any means universal.

Where can I read more about nomenclature and related issues?

A list of HGNC nomenclature publications is available.

How do I perform a search with a term that contains spaces or commas?

If you have a term that you would like to search for within the database that contains a space or a comma, such as "protein kinase", the term should be double quoted. Anything that is within double quotes will be taken literally as one search term. Without the quotes you will be searching for any entries that contain "protein" and any entries that contain "kinase".

HGNC Symbol Reports

What is an alias symbol/name?

This is a symbol or name by which a gene has been alternatively referred to in the literature or databases, or which groups it into a known gene family. Aliases are recorded along with the approved symbols and names as part of the gene entry to facilitate database searching. Databases that contain both approved symbols and aliases include:

What does the status "Withdrawn" mean?

"Symbol Withdrawn" refers to a previously approved HGNC symbol for a gene that now has a different approved symbol. "Entry Withdrawn" refers to a previously approved HGNC symbol for a gene that has since been shown not to exist.

What are mapped data?

Mapped data are identified in Gene Symbol Reports by the disclaimer "mapped data supplied by [source]" in the header of the relevant symbol report field. Mapped data are derived from external sources and as such are not subject to our strict manual checking and curation procedures. Therefore, the HGNC are unable to guarantee the same high quality for mapped data as for our curated data.

Requesting a gene symbol

My gene doesn't have an approved symbol. How do I propose one?

Fill in the gene symbol request form and submit it to the HGNC. Remember that you need to propose a name (description) and symbol (short-form abbreviation) for your gene e.g. ADK: adenosine kinase, and ideally include sequence data wherever possible. Please read the Gene symbol request help prior to sending your request.

What is the difference between a gene symbol and a gene name?

Ideally gene symbols are short, memorable and pronounceable, and most gene symbols are short form descriptions or acronyms of the gene name. Names should be brief, specific and convey something about the character or function of the gene product(s), but not attempt to describe everything known.

Will my gene symbol request remain confidential?

All submissions and resulting discussions with HGNC are treated confidentially. Unless otherwise agreed relevant data will be entered into public databases as soon as the symbol is approved.

Why can't punctuation be used in a gene symbol?

Most types of punctuation marks are not permitted in symbols as they can cause difficulty in searches of electronic databases. Use of hyphens is restricted to certain groups of genes, such as components of the major histocompatibility complex (e.g. HLA-DPA1).

Download problems

Excel is corrupting some of my gene symbols. How can I stop this from occuring?

Reports within journal articles such as Mark Ziemann's paper in 2016 and the British media highlighted that Microsoft Excel (and some other spreadsheet programs) used with default settings automatically converted some gene symbols from a text format into a date. Because of this issue, in 2019 we took the decision to change all of the approved gene symbols that were affected, to avoid this probelm recurring. This is discussed briefly in our most recent guidelines paper and more in depth in this article.

It is important to note that the original symbols still appear in the previous symbol field. If you were to import this data into a spreadsheet application, you would witness the issue returning within the previous symbol field instead of the approved symbol field. The video below shows how to import gene symbol data (CSV or tab separated) into Excel so that you can avoid introducing these errors. The video uses Excel 365 for Mac but the same procedure can be done on Windows versions and previous Excel versions.