Statistics & downloads help
Statistics & files
The Statistics & downloads page contains tables with breakdown statistics by locus group and locus type of the number of approved symbol reports we have within the database. The tables also contain icons shown below, which enable users to download the data in text (tsv) or JSON format, or link to our custom download application for the chosen dataset. Genes annotated on alternative loci included in the GRC Reference Assembly are shown separately in the second table.
The icons are as follows:
- Tab delimited text file. Multple valued fields are double quoted and delimited by | within the quotes. The format of the file should be easily viewable within a spreadsheet application such as excel.
- JSON text file (no indentation or white space). Intended for loading into a JSON parser within a script or program
- Link to the Custom downloads page for the locus type/group where users can specify exactly what data they wish to download.
Above the tables there is a drop down menus that allow you to select a specific chromosome which will change the table statistics to show the data for the selected chromosome.
Beneath the tables we also have text (tsv) and JSON files for our complete HGNC dataset, our gene groups dataset and our locus specific database links set.
Fields within the tsv and JSON files
- hgnc_id
- HGNC ID. A unique ID created by the HGNC for every approved symbol.
- symbol
- The HGNC approved gene symbol. Equates to the "Approved symbol" field within the gene symbol report.
- name
- HGNC approved name for the gene. Equates to the "Approved name" field within the gene symbol report.
- locus_group
- A group name for a set of related locus types as defined by the HGNC (e.g. non-coding RNA).
- locus_type
- The locus type as set by the HGNC.
- status
- Status of the symbol report, which can be either "Approved" or "Entry Withdrawn".
- location
- Cytogenetic location of the gene (e.g. 2q34).
- location_sortable
- Same as "location" but single digit chromosomes are prefixed with a 0 enabling them to be sorted in correct numerical order (e.g. 02q34).
- alias_symbol
- Other symbols used to refer to this gene as seen in the "Alias symbols" field in the gene symbol report.
- alias_name
- Other names used to refer to this gene as seen in the "Alias names" field in the gene symbol report.
- prev_symbol
- Gene symbols previously approved by the HGNC for this gene. Equates to the "Previous symbols" field within the gene symbol report.
- prev_name
- Gene names previously approved by the HGNC for this gene. Equates to the "Previous names" field within the gene symbol report.
- gene_group
- The gene group name as set by the HGNC and seen at the top of the gene group reports.
- gene_group_id
- ID used to designate a gene group the gene has been assigned to.
- date_approved_reserved
- The date the entry was first approved.
- date_symbol_changed
- The date the approved symbol was last changed.
- date_name_changed
- The date the approved name was last changed.
- date_modified
- Date the entry was last modified.
- entrez_id
- NCBI gene ID. Found within the "Gene resources" section of the gene symbol report.
- ensembl_gene_id
- Ensembl gene ID. Found within the "Gene resources" section of the gene symbol report.
- vega_id
- Vega gene ID. Found within the "Gene resources" section of the gene symbol report.
- ucsc_id
- UCSC gene ID. Found within the "Gene resources" section of the gene symbol report.
- ena
- International Nucleotide Sequence Database Collaboration (GenBank, ENA and DDBJ) accession number(s). Found within the "Nucleotide resources" section of the gene symbol report.
- refseq_accession
- RefSeq nucleotide accession(s). Found within the "Nucleotide resources" section of the gene symbol report.
- ccds_id
- Consensus CDS ID. Found within the "Nucleotide resources" section of the gene symbol report.
- uniprot_ids
- UniProt protein accession. Found within the "Protein resource" section of the gene symbol report.
- pubmed_id
- Pubmed and Europe Pubmed Central PMID(s).
- mgd_id
- Mouse genome informatics database ID. Found within the "Homologs" section of the gene symbol report.
- rgd_id
- Rat genome database gene ID. Found within the "Homologs" section of the gene symbol report.
- lsdb
- The name of the Locus Specific Mutation Database and URL for the gene separated by a | character, e.g. Mutations of the ATP-binding Cassette Transporter Retina|http://www.retina-international.org/files/sci-news/abcrmut.htm
- cosmic
- Symbol used within the </span>Catalogue of somatic mutations in cancer for the gene. (No longer updated!).
- omim_id
- Online Mendelian Inheritance in Man (OMIM) ID
- http://www.omim.org/entry/<ID>
- mirbase
- miRBase ID
- http://www.mirbase.org/cgi-bin/mirna_entry.pl?acc=<ID>
- homeodb
- Homeobox Database ID
- http://homeodb.cbi.pku.edu.cn/gene_info.get?id=<ID>
- snornabase
- snoRNABase ID
- https://www-snorna.biotoul.fr//plus.php?snoid=<ID>
- bioparadigms_slc
- Symbol used to link to the SLC tables database at bioparadigms.org for the gene
- http://slc.bioparadigms.org/protein?GeneName=<SYMBOL>
- orphanet
- Orphanet ID
- http://www.orpha.net/consor/cgi-bin/OC_Exp.php?Lng=GB&Expert=<ID>
- pseudogene.org
- Pseudogene.org ID
- http://tables.pseudogene.org/<ID>
- horde_id
- Symbol used within HORDE for the gene
- http://genome.weizmann.ac.il/horde/card/index/symbol:<SYMBOL>
- merops
- ID used to link to the MEROPS peptidase database
- https://www.ebi.ac.uk/merops/cgi-bin/pepsum?id=<ID>
- imgt
- Symbol used within international ImMunoGeneTics information system
- http://www.imgt.org/IMGT_GENE-DB/GENElect?query=2+<SYMBOL>&species=Homo+sapiens
- iuphar
- The objectId used to link to the IUPHAR/BPS Guide to PHARMACOLOGY database. To link to IUPHAR/BPS Guide to PHARMACOLOGY database only use the number (only use 1 from the result objectId:1) in the example URL
- http://www.guidetopharmacology.org/GRAC/ObjectDisplayForward?objectId=<ID>
- mamit-trnadb
- ID to link to the Mamit-tRNA database
- http://mamit-trna.u-strasbg.fr/mutations.asp?idAA=<ID>
- cd
- Symbol used within the Human Cell Differentiation Molecule database for the gene
- http://www.hcdm.org/index.php?option=com_molecule&cdnumber=<SYMBOL>
- lncrnadb
- lncRNA Database ID - Resource is now defunct.
- http://www.lncrnadb.org/<ID>
- enzyme_id
-
ENZYME EC accession number
- intermediate_filament_db
- ID used to link to the Human Intermediate Filament Database
- http://www.interfil.org/details.php?id=<ID>
- agr
- The HGNC ID that the Alliance of Genome Resources (AGR) have linked to their record of the gene. Use the HGNC ID to link to a AGR gene report.
- http://www.interfil.org/details.php?id=<HGNC ID>
- lncipedia
- The gene symbol used for a gene report within LNCipedia - A comprehensive compendium of human long non-coding RNAs.
- http://lncipedia.org/db/gene/<Gene Symbol>
- mane_select
- NCBI and Ensembl transcript IDs/acessions including the version number for one high-quality representative transcript per protein-coding gene that is well-supported by experimental data and represents the biology of the gene. The IDs are delimited by |.
- https://www.ncbi.nlm.nih.gov/nuccore/<NCBI transcript ID> or https://www.ensembl.org/homo_sapiens/Transcript/Summary?db=core&t=<Ensembl transcript ID>
- gencc
- The HGNC ID used within the GenCC database as the unique identifier of their gene reports within the GenCC database.
- https://search.thegencc.org/genes/<HGNC ID>