All download files including the archive files are now in a publicly accessible Google Storage Bucket. Downloads page links have been updated.

Statistics & downloads help

Statistics & files

The Statistics & downloads page contains tables with breakdown statistics by locus group and locus type of the number of approved symbol reports we have within the database. The tables also contain icons shown below, which enable users to download the data in text (tsv) or JSON format, or link to our custom download application for the chosen dataset. Genes annotated on alternative loci included in the GRC Reference Assembly are shown separately in the second table.

The icons are as follows:

Above the tables there is a drop down menus that allow you to select a specific chromosome which will change the table statistics to show the data for the selected chromosome.

Beneath the tables we also have text (tsv) and JSON files for our complete HGNC dataset, our gene groups dataset and our locus specific database links set.

Fields within the tsv and JSON files

hgnc_id
HGNC ID. A unique ID created by the HGNC for every approved symbol.
symbol
The HGNC approved gene symbol. Equates to the "Approved symbol" field within the gene symbol report.
name
HGNC approved name for the gene. Equates to the "Approved name" field within the gene symbol report.
locus_group
A group name for a set of related locus types as defined by the HGNC (e.g. non-coding RNA).
locus_type
The locus type as set by the HGNC.
status
Status of the symbol report, which can be either "Approved" or "Entry Withdrawn".
location
Cytogenetic location of the gene (e.g. 2q34).
location_sortable
Same as "location" but single digit chromosomes are prefixed with a 0 enabling them to be sorted in correct numerical order (e.g. 02q34).
alias_symbol
Other symbols used to refer to this gene as seen in the "Alias symbols" field in the gene symbol report.
alias_name
Other names used to refer to this gene as seen in the "Alias names" field in the gene symbol report.
prev_symbol
Gene symbols previously approved by the HGNC for this gene. Equates to the "Previous symbols" field within the gene symbol report.
prev_name
Gene names previously approved by the HGNC for this gene. Equates to the "Previous names" field within the gene symbol report.
gene_group
The gene group name as set by the HGNC and seen at the top of the gene group reports.
gene_group_id
ID used to designate a gene group the gene has been assigned to.
date_approved_reserved
The date the entry was first approved.
date_symbol_changed
The date the approved symbol was last changed.
date_name_changed
The date the approved name was last changed.
date_modified
Date the entry was last modified.
entrez_id
NCBI gene ID. Found within the "Gene resources" section of the gene symbol report.
ensembl_gene_id
Ensembl gene ID. Found within the "Gene resources" section of the gene symbol report.
vega_id
Vega gene ID. Found within the "Gene resources" section of the gene symbol report.
ucsc_id
UCSC gene ID. Found within the "Gene resources" section of the gene symbol report.
ena
International Nucleotide Sequence Database Collaboration (GenBank, ENA and DDBJ) accession number(s). Found within the "Nucleotide resources" section of the gene symbol report.
refseq_accession
RefSeq nucleotide accession(s). Found within the "Nucleotide resources" section of the gene symbol report.
ccds_id
Consensus CDS ID. Found within the "Nucleotide resources" section of the gene symbol report.
uniprot_ids
UniProt protein accession. Found within the "Protein resource" section of the gene symbol report.
pubmed_id
Pubmed and Europe Pubmed Central PMID(s).
mgd_id
Mouse genome informatics database ID. Found within the "Homologs" section of the gene symbol report.
rgd_id
Rat genome database gene ID. Found within the "Homologs" section of the gene symbol report.
lsdb
The name of the Locus Specific Mutation Database and URL for the gene separated by a | character, e.g. Mutations of the ATP-binding Cassette Transporter Retina|http://www.retina-international.org/files/sci-news/abcrmut.htm
cosmic
Symbol used within the </span>Catalogue of somatic mutations in cancer for the gene. (No longer updated!).
omim_id
Online Mendelian Inheritance in Man (OMIM) ID
http://www.omim.org/entry/<ID>
mirbase
miRBase ID
http://www.mirbase.org/cgi-bin/mirna_entry.pl?acc=<ID>
homeodb
Homeobox Database ID
http://homeodb.cbi.pku.edu.cn/gene_info.get?id=<ID>
snornabase
snoRNABase ID
https://www-snorna.biotoul.fr//plus.php?snoid=<ID>
bioparadigms_slc
Symbol used to link to the SLC tables database at bioparadigms.org for the gene
http://slc.bioparadigms.org/protein?GeneName=<SYMBOL>
orphanet
Orphanet ID
http://www.orpha.net/consor/cgi-bin/OC_Exp.php?Lng=GB&Expert=<ID>
pseudogene.org
Pseudogene.org ID
http://tables.pseudogene.org/<ID>
horde_id
Symbol used within HORDE for the gene
http://genome.weizmann.ac.il/horde/card/index/symbol:<SYMBOL>
merops
ID used to link to the MEROPS peptidase database
https://www.ebi.ac.uk/merops/cgi-bin/pepsum?id=<ID>
imgt
Symbol used within international ImMunoGeneTics information system
http://www.imgt.org/IMGT_GENE-DB/GENElect?query=2+<SYMBOL>&species=Homo+sapiens
iuphar
The objectId used to link to the IUPHAR/BPS Guide to PHARMACOLOGY database. To link to IUPHAR/BPS Guide to PHARMACOLOGY database only use the number (only use 1 from the result objectId:1) in the example URL
http://www.guidetopharmacology.org/GRAC/ObjectDisplayForward?objectId=<ID>
mamit-trnadb
ID to link to the Mamit-tRNA database
http://mamit-trna.u-strasbg.fr/mutations.asp?idAA=<ID>
cd
Symbol used within the Human Cell Differentiation Molecule database for the gene
http://www.hcdm.org/index.php?option=com_molecule&cdnumber=<SYMBOL>
lncrnadb
lncRNA Database ID - Resource is now defunct.
http://www.lncrnadb.org/<ID>
enzyme_id

ENZYME EC accession number

http://enzyme.expasy.org/EC/<EC ACCESSION NUMBER>

intermediate_filament_db
ID used to link to the Human Intermediate Filament Database
http://www.interfil.org/details.php?id=<ID>
agr
The HGNC ID that the Alliance of Genome Resources (AGR) have linked to their record of the gene. Use the HGNC ID to link to a AGR gene report.
http://www.interfil.org/details.php?id=<HGNC ID>
lncipedia
The gene symbol used for a gene report within LNCipedia - A comprehensive compendium of human long non-coding RNAs.
http://lncipedia.org/db/gene/<Gene Symbol>
mane_select
NCBI and Ensembl transcript IDs/acessions including the version number for one high-quality representative transcript per protein-coding gene that is well-supported by experimental data and represents the biology of the gene. The IDs are delimited by |.
https://www.ncbi.nlm.nih.gov/nuccore/<NCBI transcript ID> or https://www.ensembl.org/homo_sapiens/Transcript/Summary?db=core&t=<Ensembl transcript ID>
gencc
The HGNC ID used within the GenCC database as the unique identifier of their gene reports within the GenCC database.
https://search.thegencc.org/genes/<HGNC ID>